Troubleshooting, Restoring and Restarting AutoUpgrade

What if … something goes wrong during your database upgrade with AutoUpgrade? Of course, this never happens. But just in case, no worries – one of the really big advantages of AutoUpgrade (and the command line upgrade with “dbupgrade” as well) is its resumability and restartablity. But in this blog post I would like to cover a bit more. I will show you more details on Troubleshooting, Restoring and Restarting AutoUpgrade.

Troubleshooting, Restoring and Restarting AutoUpgrade

Photo by Mark Boss on Unsplash

And as often, the cause for this blog post is an email from Hervé Schweitzer, the CTO of dbi-services. Hervé told me about a failed upgrade at a customer where they ran out of shared pool. He fixed the problem already, I’d say with a brutal slam with a hammer – but he asked my what would be the ideal and recommended way to solve it.

This gives me the opportunity to write about something I planned to write for a long time. And actually, the blog post became a bit longer than I initially planned for.

Please open an SR

No worries, this is my standard answer. We need the logs in order to be able to say anything useful. And my wonderful team mates built a very nice and useful mechanism into AutoUpgrade which collects the logs, and zips them together. This is very handy in case you want to open an SR.

AutoUpgrade has been built from second One in a way to make sure it logs everything. Having adequate and readable (more or less) log information allows not only our Support mates but also ourselves to figure out what has gone wrong (or right). If you don’t log, you’ll be lost.

Now, when you open an SR, please do the following:

  1. Double-check that you’ve use the newest version of AutoUpgrade:
    $ java -jar $OH19/rdbms/admin/autoupgrade.jar -version
    build.hash 6010a62
    build.version 19.8.1
    build.date 2020/04/01 16:18:24
    build.max_target_version 19
    build.type production

    If not, please download the newest version from MOS Note: 2485457.1 and repeat your test. At the moment, we release an updated and improved version every 4-6 weeks. Sometimes we deliver fix overnight if necessary. Hence, it’s really key that you check to have the newest version at first. At the end of MOS Note: 2485457.1 you will find a changelog which tells you exactly what we fixed in which version.
    .

  2. Then zip the logs together:
    $ java -jar $OH19/rdbms/admin/autoupgrade.jar -config DB12.cfg -zip
    Path does not exist at '/home/oracle/scripts/alert_DB12.log' This path will not be zipped. Please review your configuration file.
    Processing: 344847 [bytes] in 20 Files 
    | 0%
    Zipped successfully at /home/oracle/scripts/AUPG_200407_1334_642.zip

    At the moment you will see a message regarding the alert.log not being included. We will fix this soon. But meanwhile, please upload at least 2 days of your alert.log to MOS as well in addition.

    Or you do the following and change to the directory where the alert.log is located in:

    $ cd $ORACLE_BASE/diag/rdbms/db12/DB12/trace
    [DB12] oracle@hol:/u01/app/oracle/diag/rdbms/db12/DB12/trace
    $ java -jar $OH19/rdbms/admin/autoupgrade.jar -config /home/oracle/scripts/DB12.cfg -zip
    Processing: 1224834 [bytes] in 21 Files 
    | 0%
    Zipped successfully at /u01/app/oracle/diag/rdbms/db12/DB12/trace/AUPG_200407_1339_531.zip

    And then it will be included, too.
    .

  3. Then open an SR:
    Go to MyOracle Support and create an SR. Please don’t wonder about outdated choices – this is not in our control. And the correct routing will work anyways.
    .

What if … you lost the job console?

This is a scenario but also a question I received quite often in the hands-on labs we’ve did at OOW and during workshop. So let me do a quick exercise. I will start the upgrade, and then terminate the job console.

$ java -jar $OH19/rdbms/admin/autoupgrade.jar -config DB12.cfg -mode deploy
AutoUpgrade tool launched with default options
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Type 'help' to list console commands
upg>

The upgrade is now running. It starts with the prechecks and the fixups. I’m checking it a while later:

upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 101|   DB12|DBUPGRADE|EXECUTING|RUNNING|20/04/07 14:34|     N/A|14:41:57|5%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 1

Killing the console

Now I “kill” the console – and I’m back at the command prompt.

How do you “kill” the console? The normal way would be to type the “exit” command and confirm with “y”. When you check the alert.log, you will recognize that the database gets shutdown at this point.

But I tried other ways, too, such as sending a “kill -9” to the process or trying CTRL+C and CTRL+V – and I received mixed results. In some cases the database was still up, in others it went down as well.

Upgrade stopped

In any case, my upgrade gets stopped the hard way. I’m checking the main worker’s log file in /home/oracle/logs/DB12/101/dbupgrade:

****************   Catproc Tables   ****************
Parallel Phase #:9    [DB12] Files:67
Unexpected error encountered in catctlMain; Error Stack Below; exiting
Died at /u01/app/oracle/product/19/rdbms/admin/catcon.pm line 18162.
 at /u01/app/oracle/product/19/rdbms/admin/catcon.pm line 18162.
        catcon::catcon_HandleSigINT("INT") called at /u01/app/oracle/product/19/rdbms/admin/catcon.pm line 9932
        eval {...} called at /u01/app/oracle/product/19/rdbms/admin/catcon.pm line 9932
        catcon::next_proc(4, 4, 3, undef, "/home/oracle/logs/DB12/101/dbupgrade/catupgrd20200407143441db12", ARRAY(0x2b983e8), 0, ARRAY(0x2b98298), ...) called at /u01/app/oracle/product/19/rdbms/admin/catcon.pm line 10146
        catcon::pickNextProc(4, 4, 3, "/home/oracle/logs/DB12/101/dbupgrade/catupgrd20200407143441db12", ARRAY(0x2b983e8), 0, ARRAY(0x2b98298), "/home/oracle/logs/DB12/101/dbupgrade/catupgrd20200407143441db"..., ...) called at /u01/app/oracle/product/19/rdbms/admin/catcon.pm line 15287

Uh … this looks pretty ugly. My upgrade died.

But wasn’t command line upgrade always resumable? Of course it is. As is AutoUpgrade.
Let me call the command again:

$ java -jar $OH19/rdbms/admin/autoupgrade.jar -config DB12.cfg -mode deploy
Previous execution found loading latest data
Total jobs recovered: 1
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
Type 'help' to list console commands

You may recognize the lines I marked in red. AutoUpgrade knows about the previous executions due to the log information kept under job number 101 in your log tree.

And here we go – AutoUpgrade is seamless resuming my upgrade.

upg> lsj
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
| 101|   DB12|DBUPGRADE|EXECUTING|RUNNING|20/04/07 14:34|     N/A|14:49:18|8%Upgraded |
+----+-------+---------+---------+-------+--------------+--------+--------+-----------+
Total jobs 1

This tool isn’t clearly dumb.

Now let me show you another scenario.

What if … your upgrade failed?

There may be various situations where you’ve got stuck or you simply want to have AutoUpgrade start from the beginning again. As you saw in my previous example AutoUpgrade will resume the upgrade from where it has been stopped. But if you’d like to have it start from the beginning again, you have to erase its memory.

The unofficial way would be to simply clean up the entire log tree.

But before doing such things, wait a bit. In which state is the database when the upgrade got stopped? It is still in UPGRADE mode and started in the future environment? Set your environment correctly for the Oracle Home you are upgrading to. And then connect to the database. If the database is down, start it up in UPGRADE mode.

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Apr 7 21:42:35 2020
Version 19.6.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup upgrade
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u01/app/oracle/product/19/dbs/initDB12.ora'

Oh. The SPFILE hasn’t been copied to the new home. But no worries, the AutoUpgrade tool has created the init.ora for after the upgrade already. You will find the files in this directory under your log tree:

[DB12] oracle@hol:~/logs/DB12/temp
$ ls -lrt
total 12
-rwx------. 1 oracle dba 1391 Apr  7 12:46 before_upgrade_pfile_DB12.ora
-rwx------. 1 oracle dba 1391 Apr  7 12:46 after_upgrade_pfile_DB12.ora
-rwx------. 1 oracle dba 1417 Apr  7 21:24 during_upgrade_pfile_DB12.ora
-rwx------. 1 oracle dba    0 Apr  7 21:28 db12.restart

Use this file marked in red to start the database in STARTUP UPGRADE mode:

SQL> startup upgrade pfile=/home/oracle/logs/DB12/temp/after_upgrade_pfile_DB12.ora
ORACLE instance started.

Total System Global Area 1258290752 bytes
Fixed Size		    8896064 bytes
Variable Size		  318767104 bytes
Database Buffers	  922746880 bytes
Redo Buffers		    7880704 bytes
Database mounted.
Database opened.
SQL>

So before doing anything, you’ll have to differ whether you ran the upgrade with upg1.restoration=yes (the default when your database is in ARCHIVELOG mode) or with upg1.restoration=no. In the latter case, no guaranteed restore point has been written.

Let us check:

SQL> column name format a35
SQL> column guarantee_flashback_database format a30
SQL> select name, guarantee_flashback_database from v$restore_point;

NAME				 GUARANTEE_FLASHBACK_DATABASE
-------------------------------- ------------------------------
AUTOUPGRADE_221145114461854_DB12 YES

Ok, this means your database is in ARCHIVELOG mode, AutoUpgrade has set a GRP – and you can flashback to it. You can do this from within SQL*Plus or – of course – in AutoUpgrade as well. I will do the latter to demonstrate it:

Flash me back, AutoUpgrade!

At first, I attach back to AutoUpgrade, and it will automatically resume.

$ java -jar $OH19/rdbms/admin/autoupgrade.jar -config DB12.cfg -mode deploy
Previous execution found loading latest data
Total jobs recovered: 1
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
Type 'help' to list console commands

Then I will abort the job.

upg> abort -job 101
Are you sure you want to abort job [101] ? [y|N] y
Abort job: [101][DB12]
upg> 
-------------------------------------------------
Errors in database [DB12]
Stage     [DBUPGRADE]
Operation [STOPPED]
Status    [ERROR]
Info    [
Error: UPG-1419
[Unexpected Exception Error]
Cause: This indicates that the upgrade has been killed, by an abort or restore command.
For further details, see the log file located at /home/oracle/logs/DB12/101/autoupgrade_20200407_user.log]

-------------------------------------------------
Logs: [/home/oracle/logs/DB12/101/autoupgrade_20200407_user.log]
-------------------------------------------------
upg> lsj
+----+-------+---------+---------+------+--------------+--------+--------+--------+
|Job#|DB_NAME|    STAGE|OPERATION|STATUS|    START_TIME|END_TIME| UPDATED| MESSAGE|
+----+-------+---------+---------+------+--------------+--------+--------+--------+
| 101|   DB12|DBUPGRADE|  STOPPED| ERROR|20/04/07 21:22|     N/A|21:58:34|UPG-1419|
+----+-------+---------+---------+------+--------------+--------+--------+--------+
Total jobs 1

And finally I will restore it, which will complete the flashback to the guaranteed restore point.

upg> restore -job 101
The job 101[DB12] on stage [DBUPGRADE] has the status [ABORTED]
Are you sure you want to restore? all progress will be lost [y/N] y
upg> lsj
+----+-------+----------+---------+-------+--------------+--------+--------+----------------+
|Job#|DB_NAME|     STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|         MESSAGE|
+----+-------+----------+---------+-------+--------------+--------+--------+----------------+
| 101|   DB12|GRPRESTORE|EXECUTING|RUNNING|20/04/07 21:22|     N/A|22:00:52|Restore Phase[4]|
+----+-------+----------+---------+-------+--------------+--------+--------+----------------+
Total jobs 1
upg> lsj
+----+-------+-----+---------+--------+--------------+--------+--------+-----------+
|Job#|DB_NAME|STAGE|OPERATION|  STATUS|    START_TIME|END_TIME| UPDATED|    MESSAGE|
+----+-------+-----+---------+--------+--------------+--------+--------+-----------+
| 101|   DB12|SETUP|  STOPPED|FINISHED|20/04/07 21:22|     N/A|22:01:17|DB Restored|
+----+-------+-----+---------+--------+--------------+--------+--------+-----------+
Total jobs 1

This happens really fast.

No restore point, no flashback!

If you ran the upgrade with upg1.restoration=no, then no guaranteed restore point has been written. You have two choices now. Either restore your backup and attempt the upgrade again, or try to resume it.

If you’d like to resume it, you follow the advice above. Start the database in the new environment in STARTUP UPGRADE mode, then call AutoUpgrade with -mode deploy again to resume the upgrade from where it has been stopped.

Start me up!

If you flashed the database back, you can restart the upgrade now again. If you’d like to pertain the previous logs, you may want the next run to have a new job number. For this purpose, the -clear_recovery_data flag exists. It will erase AutoUpgrade’s memory about the previous runs:

$ java -jar $OH19/rdbms/admin/autoupgrade.jar -config DB12.cfg -mode deploy -clear_recovery_data
The recovery data was removed, the AutoUpgrade will start from the beginning
AutoUpgrade tool launched with default options
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Type 'help' to list console commands
upg>

The new run gets a new job number when you used the -clear_recovery_data flag:

upg>lsj
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME|END_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
| 102|   DB12|PRECHECKS|PREPARING|RUNNING|20/04/07 22:15|     N/A|22:15:29|Loading DB info|
+----+-------+---------+---------+-------+--------------+--------+--------+---------------+
Total jobs 1
upg>

Summary

In brief, the important question is: Did you run your upgrade with the default of having a guaranteed restore point, or not. You may need to start the database in the new environment with STARTUP UPGRADE by using the init.ora AutoUpgrade did create initially.

From there you can decide whether you’d like to flashback to the guaranteed restore point, either manually or within AutoUpgrade. Or if you’d like to resume the upgrade from where it has been stopped.

If you want to erase the memory of AutoUpgrade, use the -clear_recovery_data flag when you call AutoUpgrade with -mode deploy again.

Additional Information and Links

–Mike

 

Share this: