AutoUpgrade and the Partial Offline Backup Strategy

In our Fallback talk we promote a backup strategy for large DWHs and database which are on purpose in NOARCHIVELOG mode. But last week I learned that there is something important to know about AutoUpgrade and the Partial Offline Backup Strategy. Otherwise you may hit a

AutoUpgrade and the Partial Offline Backup Strategy

Photo by Esri Esri on Unsplash

pitfall.

 

Partial Offline Backup Strategy?

I won’t explain in all details what this is. If you’d like to read more, please find all the necessary information on my blog post from July 17, 2018:

with graphics included. In brief, you take an offline backup of the “heart” of the database. And in case of failure (or testing), you’ll restore it. The key is that your data tablespaces will be read only while you take this offline backup. And we call it “partial” as you will backup (and restore) only the core pieces of the database.

What happened?

A customer mailed me the other week sending me his test results. He realized that – despite the fact he set his USERS tablespace read-only before the upgrade – the datafile received updates. And even worse, when he restored the partial offline backup afterwards, the database complained about the USERS tablespace’s datafiles not matching the controlfile.

Of course, at first I thought: Must be a user error.

And while I tested quickly in my environment, I hit exactly the same problem.

But with the help of our team I found out what causes the issue.

A quick test run

Let me do a very simple and quick test with my DB12 database, an Oracle 12.2.0.1 database. But it actually doesn’t matter which database I use for my tests.

At first, I switch my only data tablespace read-only:

SQL*Plus: Release 12.2.0.1.0 Production on Tue Aug 18 11:36:46 2020

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> alter tablespace USERS read only;

Tablespace altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

As next step, I backup all datafiles for the core tablespaces, i.e. SYSTEM, SYSAUX, UNDO, TEMP and potentially old repository tablespaces which don’t exist in my database. I also backup the controlfiles and the redologs:

$ cd /u02/oradata/DB12/
[DB12] oracle@hol:/u02/oradata/DB12
$ mkdir bck
[DB12] oracle@hol:/u02/oradata/DB12
$ ls
bck            redo01.log  redo03.log    system01.dbf  undotbs01.dbf
control01.ctl  redo02.log  sysaux01.dbf  temp01.dbf    users01.dbf
[DB12] oracle@hol:/u02/oradata/DB12
$ cp con* ./bck
[DB12] oracle@hol:/u02/oradata/DB12
$ cp redo* ./bck
[DB12] oracle@hol:/u02/oradata/DB12
$ cp sys* ./bck
[DB12] oracle@hol:/u02/oradata/DB12
$ cp undo* ./bck
[DB12] oracle@hol:/u02/oradata/DB12
$ cp temp* ./bck
[DB12] oracle@hol:/u02/oradata/DB12
$ ls -lrt ./bck
total 2614888
-rw-r-----. 1 oracle dba  10600448 Aug 18 11:39 control01.ctl
-rw-r-----. 1 oracle dba 209715712 Aug 18 11:39 redo01.log
-rw-r-----. 1 oracle dba 209715712 Aug 18 11:39 redo02.log
-rw-r-----. 1 oracle dba 209715712 Aug 18 11:39 redo03.log
-rw-r-----. 1 oracle dba 576724992 Aug 18 11:40 sysaux01.dbf
-rw-r-----. 1 oracle dba 796925952 Aug 18 11:40 system01.dbf
-rw-r-----. 1 oracle dba 618668032 Aug 18 11:40 undotbs01.dbf
-rw-r-----. 1 oracle dba  65019904 Aug 18 11:40 temp01.dbf

And then I invoke AutoUpgrade, at first as usual with an “-mode analyze” run. But before I can do this, I need to start my database again:

SQL*Plus: Release 12.2.0.1.0 Production on Tue Aug 18 11:42:35 2020

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1258291200 bytes
Fixed Size		    8620224 bytes
Variable Size		  452986688 bytes
Database Buffers	  788529152 bytes
Redo Buffers		    8155136 bytes
Database mounted.
Database opened.
SQL>

Now I can execute an “analyze”:

$ java -jar $OH19/rdbms/admin/autoupgrade.jar -config DB12.cfg -mode analyze
AutoUpgrade tool launched with default options
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be analyzed
Type 'help' to list console commands
upg> Job 100 completed
------------------- Final Summary --------------------
Number of databases            [ 1 ]

Jobs finished successfully     [1]
Jobs failed                    [0]
Jobs pending                   [0]
------------- JOBS FINISHED SUCCESSFULLY -------------
Job 100 for DB12

And afterwards, I check the HTML file in my log directory: ../DB12/100/prechecks/db12_prechecks.html. It shows me one ERROR:

“Modify the tablespace(s) below with ALTER TABLESPACE command to allow write operations before starting database upgrade. Not doing so can lead to failures during upgrade.”

And further:

All default tablespaces storing Oracle-maintained dictionary objects must allow write operations during database upgrade. There is one or more tablespaces with currently does not allow write operations.

It lists the USERS tablespace.

AutoUpgrade and the Partial Offline Backup Strategy

Now I double-check to verify that there is not a single Oracle maintained object in this tablespace:

SQL> select segment_name, owner from dba_segments where tablespace_name='USERS';

no rows selected

This looks good – I’m confident that the USERS tablespace can remain read-only as there are no Oracle maintained objects in it. Actually, it is pretty empty with no objects in it.

Still, I wondered a bit about why AutoUpgrade marks this error with Fixup Available: Yes?

AutoUpgrade and the Partial Offline Backup Strategy

Actually, I should have paid a bit more attention to this message – as I found out a few minutes later.

Look at the timestamp of my USERS tablespace (users01.dbf) before I invoke “-mode deploy” to allow AutoUpgrade to upgrade my database:

-rw-r-----. 1 oracle dba  65019904 Jan 21  2020 temp01.dbf
-rw-r-----. 1 oracle dba   5251072 Aug 18 11:37 users01.dbf
drwxr-xr-x. 2 oracle dba      4096 Aug 18 11:40 bck
-rw-r-----. 1 oracle dba 209715712 Aug 18 11:42 redo02.log
-rw-r-----. 1 oracle dba 209715712 Aug 18 11:42 redo03.log
-rw-r-----. 1 oracle dba 576724992 Aug 18 11:49 sysaux01.dbf
-rw-r-----. 1 oracle dba 618668032 Aug 18 11:51 undotbs01.dbf
-rw-r-----. 1 oracle dba 796925952 Aug 18 11:51 system01.dbf
-rw-r-----. 1 oracle dba 209715712 Aug 18 11:51 redo01.log
-rw-r-----. 1 oracle dba  10600448 Aug 18 11:51 control01.ctl

It states: 11:37.

I start AutoUpgrade on this database:

$ java -jar $OH19/rdbms/admin/autoupgrade.jar -config DB12.cfg -mode deploy
AutoUpgrade tool launched with default options
Processing config file ...
+--------------------------------+
| Starting AutoUpgrade execution |
+--------------------------------+
1 databases will be processed
Type 'help' to list console commands
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+-------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME| UPDATED|      MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+-------------+
| 101|   DB12|PREFIXUPS|EXECUTING|RUNNING|20/08/18 11:53|11:54:02|Remaining 3/4|
+----+-------+---------+---------+-------+--------------+--------+-------------+
Total jobs 1

And as usual, I monitor my upgrade with a python SimpleHTTPServer:

AutoUpgrade and the Partial Offline Backup Strategy

 

Restore – Ouch!

I like to test my fallback strategy – hence, I will stop the upgrade. And as I ran with upg1.restoration=NO, there is no option to restore within AutoUpgrade to a guaranteed restore point. I will do this by myself.

upg> abort -job 101
Are you sure you want to abort job [101] ? [y|N] y
Abort job: [101][DB12]
upg> 
-------------------------------------------------
Errors in database [DB12]
Stage     [DBUPGRADE]
Operation [STOPPED]
Status    [ERROR]
Info    [
Error: UPG-1419
[Unexpected exception error]
Cause: Database upgrade job has been killed, by abort or restore
For further details, see the log file located at /home/oracle/logs/DB12/101/autoupgrade_20200818_user.log]

-------------------------------------------------
Logs: [/home/oracle/logs/DB12/101/autoupgrade_20200818_user.log]
-------------------------------------------------
upg> lsj
+----+-------+---------+---------+-------+--------------+--------+---------------+
|Job#|DB_NAME|    STAGE|OPERATION| STATUS|    START_TIME| UPDATED|        MESSAGE|
+----+-------+---------+---------+-------+--------------+--------+---------------+
| 101|   DB12|DBUPGRADE|  STOPPED|ABORTED|20/08/18 11:53|12:11:34|Job 101 aborted|
+----+-------+---------+---------+-------+--------------+--------+---------------+
Total jobs 1

The database is shutdown.

I restore my files:

[DB12] oracle@hol:/u02/oradata/DB12
$ cp ./bck/* .

Do you recognize the changed timestamp for users01.dbf? It was 11:37 before – but not it is 12:09.

$ ls -lrt
total 2620020
drwxr-xr-x. 2 oracle dba      4096 Aug 18 11:40 bck
-rw-r-----. 1 oracle dba   5251072 Aug 18 12:09 users01.dbf
-rw-r-----. 1 oracle dba  10600448 Aug 18 12:14 control01.ctl
-rw-r-----. 1 oracle dba 209715712 Aug 18 12:14 redo01.log
-rw-r-----. 1 oracle dba 209715712 Aug 18 12:14 redo02.log
-rw-r-----. 1 oracle dba 209715712 Aug 18 12:14 redo03.log
-rw-r-----. 1 oracle dba 576724992 Aug 18 12:14 sysaux01.dbf
-rw-r-----. 1 oracle dba 796925952 Aug 18 12:14 system01.dbf
-rw-r-----. 1 oracle dba  65019904 Aug 18 12:14 temp01.dbf
-rw-r-----. 1 oracle dba 618668032 Aug 18 12:14 undotbs01.dbf

This will cause trouble, I’m pretty sure.

And I try to start my database:

SQL> startup
ORACLE instance started.

Total System Global Area 1258291200 bytes
Fixed Size		    8620224 bytes
Variable Size		  452986688 bytes
Database Buffers	  788529152 bytes
Redo Buffers		    8155136 bytes
Database mounted.
ORA-01122: database file 4 failed verification check
ORA-01110: data file 4: '/u02/oradata/DB12/users01.dbf'
ORA-01207: file is more recent than control file - old control file

I expected this as the user01.dbf file had seen updates during upgrade. But how could this happen?

AutoUpgrade switched the tablespace

It’s correct. AutoUpgrade did alter the tablespace read-write again. It told me already about it: “Fixup Available: Yes“. And I ignored it.

But why?

My first assumption was: Default tablespace for Oracle maintained users. I did a quick check after reverting back to my snapshot.

SQL> select username, default_tablespace from dba_users 
  2  where ORACLE_MAINTAINED='Y' 
  3  order by 2,1;

USERNAME			 DEFAULT_TABLESPACE
-------------------------------- --------------------
ANONYMOUS			 SYSAUX
APPQOSSYS			 SYSAUX
DBSFWUSER			 SYSAUX
DBSNMP				 SYSAUX
GGSYS				 SYSAUX
GSMADMIN_INTERNAL		 SYSAUX
MDSYS				 SYSAUX
ORDDATA 			 SYSAUX
ORDPLUGINS			 SYSAUX
ORDSYS				 SYSAUX
SI_INFORMTN_SCHEMA		 SYSAUX
WMSYS				 SYSAUX
XDB				 SYSAUX
LBACSYS 			 SYSTEM
OJVMSYS 			 SYSTEM
OUTLN				 SYSTEM
SYS				 SYSTEM
SYS$UMF 			 SYSTEM
SYSTEM				 SYSTEM
XS$NULL 			 SYSTEM
AUDSYS				 USERS
DIP				 USERS
GSMCATUSER			 USERS
GSMUSER 			 USERS
ORACLE_OCM			 USERS
REMOTE_SCHEDULER_AGENT		 USERS
SYSBACKUP			 USERS
SYSDG				 USERS
SYSKM				 USERS
SYSRAC				 USERS

30 rows selected.

As next step, I did ALTER all Oracle maintained users with default tablespace USERS. I use two scripts, one to ALTER the default tablespace, and another one to revert. I post only the SQL here without the formatting and the spool commands.

select 'ALTER USER ' || username || ' DEFAULT TABLESPACE SYSAUX;' from dba_users where oracle_maintained='Y' and default_tablespace='USERS';

select 'ALTER USER ' || username || ' DEFAULT TABLESPACE USERS;' from dba_users where oracle_maintained='Y' and default_tablespace='USERS';

But this alone does not solve my problem. AutoUpgrade still changes the tablespace to read-write.

It’s the Default Permanent Tablespace

Well, from my team mates I learned then: We check for the default permanent tablespace. And this makes perfect sense. During upgrade, new users may be created. And these new users shouldn’t have the SYSTEM tablespace assigned as it was done in older releases of Oracle.

This is the important query:

SELECT PROPERTY_VALUE
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME = 'DEFAULT_PERMANENT_TABLESPACE';

And in my case, this query told me: USERS.

The Solution

When I ALTER the default tablespace before invoking AutoUpgrade, the ERROR disappears:

ALTER DATABASE DEFAULT TABLESPACE SYSAUX;

AutoUpgrade can upgrade my database now.

I’m still not 100% confident as I see all these Oracle-maintained users with the default tablespace USERS. So I highly recommend to either adjust the Oracle maintained users – or to include the USERS tablespace into the Partial Offline Backup as well. This way, you have less work – and all will go smoothly (unless your USERS tablespace is huge).

Summary

My example here is done with the tablespace USERS. But in your real-world scenario, there may be different and other tablespaces involved. So please check. You starting points are:

  • Which is your default permanent tablespace?
  • Which of the Oracle-maintained users have other default tablespaces than SYSTEM and SYSAUX?

And this blog post will be useful only in the case you plan to use the Partial Offline Backup strategy AND you will use AutoUpgrade. Neither dbupgrade nor DBUA will alter the tablespace as far as I’m aware. At worst case, you may get errors during the upgrade. But then you know that your fallback plan works. And this is what fallback strategies are made for, right?

In any case, please test this carefully before you rely on it for your 1200TB DWH.

AutoUpgrade does the right thing. You decided to let your database(s) upgrade unattended. And AutoUpgrade does the necessary steps.

But – and this is what we may improve – the message in the HTML file should be more instructive. That is in progress already.

Further Information and Links

–Mike

Share this: