This is the right blog post for a Friday 13th. And please forgive me – I wanted to put this on the blog earlier as two of my customers hit this weeks ago already. But it must have fallen through the cracks. Still, now it is hopefully not too late to tell you what you should do if you hit ORA-29702 – and your instance does not startup in the cluster anymore. Especially when you tested a database upgrade – and after a restore, the database doesn’t want to start, no matter what you try.
You upgrade Grid Infrastructure / Oracle Clusterware (for instance to to 19.7 or 19.8 but other versions may be affected as well).
Then you do a database upgrade, most likely of an 126.96.36.199 database but it could happen also with a 188.8.131.52 database or a different version. After the upgrade completed, you revert to your state “before upgrade” as you’d like to test it again. And it doesn’t matter if you do a “autoupgrade.jar -restore” or if you do a Flashback Database to a GRP or if you restore your database backup.
Whatever you do trying start a database with the same name again, it will fail with:
SQL> startup nomount ORA-29702: error occurred in Cluster Group Service operation
Now you investigate in the alert.log – and there you’ll find something like this snippet:
USER (ospid: 7777): terminating the instance due to error 29702 Instance terminated by USER, pid = 7777
You then check the Clusterware status and see:
InstAgent::startInstance 170 ORA-29701 or ORA-29702 or ORA-46365 shutdown abort ORA-29702: error occurred in Cluster Group Service operation InstAgent::startInstance 160 ORA-29701 or ORA-29702 or ORA-46365 instance dumping m_instanceType:1 m_lastOCIError:29702
Further investigation brings you to the cluster’s alert.log:
2020-10-13 15:02:31.596 [ORAAGENT(267248)]CRS-5017: The resource action "ora.abcdefg.db start" encountered the following error: 2020-10-13 15:02:31.596+ORA-01034: ORACLE not available
And I’d guess, now you’ll be quite “excited”.
How do you solve this problem?
The problem is happening because of Bug 31561819 – Incompatible maxmembers at CRSD Level Causing Database Instance Not Able to Start. And to be honest, you don’t need to even restore or flashback a database to hit this error. A simple instance in NOMOUNT state leads to the same error. Without even any datafile.
The bug is fixed from these RUs on:
- 184.108.40.206.201020 (Oct 2020) OCW RU
- 220.127.116.11.201020 (Oct 2020) OCW RU
- 18.104.22.168.201020 (Oct 2020) OCW RU
You can download the fix for lower-version platforms as well, but it seems to include the entire stack then.
As far as I can see, there is no MOS note about it. But since I worked with the two customers who hit this issue a while ago, and on whose behalf the bug has been filed and fixed, I receive now once every week an email from a customer running into this problem.
Further Information and Links
- MOS Note:31561819.8 – Bug 31561819 – Incompatible maxmembers at CRSD Level Causing Database Instance Not Able to Start
- Fix for Bug 31561819