Actually I blog about this topic for the simple reason that currently you won’t find helpful information in MyOracle Support. And I’ve seen this issue now twice in a row at Enterprise customers. You may see AutoUpgrade in trouble when you are short on RAM. During one of the restarts AutoUpgrade initiates you may get an ORA-600.
What you may see
At first, the alert.log – and this is the strange pattern here – does not give you any indication about this ORA-600. You will find it only in the logs written by autoupgrade.jar:
2020-05-29 17:15:03.254 ERROR DATABASE NAME: hugo01 CAUSE: ERROR at Line 870491 in [/autoupgrade/hugo01/100/dbupgrade/catupgrd20200529161301hugo010.log] REASON: ORA-00600: internal error code, arguments: , , , , , , , , catupgrd20200529161301hugo010.log ======================================== SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. ERROR: ORA-00600: internal error code, arguments: , , , , , , , , , , ,  IBM AIX RISC System/6000 Error: 13: Permission denied Additional information: 8 Additional information: 51315686 SQL> SQL> startup restrict pfile=/autoupgrade/hugo01/100/dbupgrade/catupgrd20200529161301hugo01_20200529170657_24577676.ora; SP2-0640: Not connected SQL> SP2-0640: Not connected SQL> SP2-0640: Not connected SQL> SQL> ========== Process Terminated by catcon ==========
But in the alert.log you will see only this:
alert.log ============ 2020-05-29T17:13:05.199299+02:00 Shutting down instance (immediate) (OS id: 657404) Shutting down instance: further logons disabled ... 2020-05-29T17:13:45.070166+02:00 Instance shutdown complete (OS id: 657404) 2020-05-29T17:14:27.749130+02:00 Starting ORACLE instance (normal) (OS id: 52626574) ... 2020-05-29T17:14:43.791515+02:00 Database mounted in Exclusive Mode Lost write protection disabled ... (PID:6619978): Using STANDBY_ARCHIVE_DEST parameter default value as /.../arch/ [krsd.c:17775] Completed: ALTER DATABASE MOUNT 2020-05-29T17:14:44.290773+02:00 ALTER DATABASE OPEN MIGRATE ... 2020-05-29T17:14:48.510007+02:00 CJQ0 started with pid=42, OS id=64947190 Completed: ALTER DATABASE OPEN MIGRATE
And no indication about the ORA-600 anywhere.
I’ve seen the exact same pattern a few weeks ago at another customer on Exadata within an OVM. And when I tried to debug this case, it looked to me as if there was a memory shortage.
In the above case, my ACS colleague Gisela confirmed that this environment is short on RAM, too.
How do you solve it?
Today Gisela reported back to me that the solution was:
- Increase the RAM for this environment
- And as the following error has happened, too:
KUP-04095: preprocessor command /opt/app/oracle/database/18.104.22.168.0_A/QOpatch/qopiprep.bat encountered error "pipe read timeout"
they set these underscores as remedy:
And please, as a general rule, don’t set underscores just blindly. I’m documenting the workaround here in case you ran into the same error and land here via a search engine search (and as reminder to myself in case I need it someday).
Actually I’m wondering sometimes. There are more and more virtual environments consolidated together. And RAM assignments get tighter and tighter. Well, you know already that Multitenant would be the correct answer here – or Cloud.
Generally, please don’t forget that autoupgrade is spawning processes in order to upgrade your database(s) quickly and unattended. It needs air (aka RAM) to breath. And in this particular case, a “resume” may not solve it unless you increase the available RAM to your virtual environment.
Further Information and Links
- AutoUpgrade Troubleshooting, Restarting and Restoring
- AutoUpgrade – Running two or more sessions in parallel
- AutoUpgrade may fail when patch ID column is NULL
I ran into this sort of problem with a ExaCS second database provision through the OCI console. The OCI console assumes it can have all the RAM for the db provision, thus db no. 2 provision fails if the first db is running. Solved this by shutting down db1 while provisioning db2 , resizing db2 sga and then starting up both databases. It was more than slightly confusing. Also difficult to get at the logs which are inside OCI.
I sometimes had the error with autoupgrade smaller version 19.9.0 but never with version 19.9.0. My workaround was:
– Set the environment of the target version
– shutdown the instance
– startup the instance in upgrade mode
– Restart autoupgrade
We can easily reproduce this error (ORA-00600 ) when upgrading from 18c to 19c. The server has 130 GB memory. We set sga_target and sga_max_size to 80 G. It’s unlikely we’re short of memory. We opened an SR (3-24223647361 for those that can read it). Oracle Support only says “This error was caused when trying to re-open the database to run the post upgrade fixups. The appears to be a resource issue. The error was detected by AutoUpgrade and we stop the post upgrade fixups from getting applied. In this case there is nothing AutoUpgrade can do to fix this resource issue.” But we checked all sorts of resources (OS and DB), couldn’t identify what resource could possibly have run out. Since we don’t have errors like KUP-04095 (pipe read timeout), we didn’t set the underscore parameters suggested here. In the meantime, our upgraded database runs perfectly fine. We just don’t know what step, if any, was not run due to the error.
I was finally able to run autoupgrade successfully. I’m not positive but I think unsetting ORACLE_HOME in my shell may have made the difference. We had ORACLE_HOME and PATH set in ~/.bash_profile pointing to the 18c paths. I unset them, as well as ORACLE_SID. ORA-600  no longer occurred.
Interestingly, in $DBHOME/rdbms/log, there’s a trc file show full call stack for this ORA-600 (only 1 trace file, even though I tried multiple upgrades and most upgrades failed with this ORA-600 error). The stack is
ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- kgeriv_int <- kgeriv <- kgeasi <- kspgip <- kslwt_fix_wait <- kslwaitctx <- kzan_open_osfile <- kzanOpenUnifiedAuditTrail <- kzaf_dump_audit <- kzaf_insert_audit_rec_to_file <- kzaf_write_audit_int <- kzan_write_prelim_record <- kzasydmp_new <- kpolpi <- kpoauth <- opiodr …
Function kspgip appears to be the top one before error handling. I can't find any relevant information about this stack. Since it mentions audit trail, I changed mixed mode auditing to pure unified auditing before the upgrade (`make -f ins_rdbms.mk uniaud_on ioracle'). But I don't think it's relevant to solving the problem.
I also lowered SGA from 80G to 40G and lowered HugePages accordingly. Again, I don't think it's relevant.