This is the right blog post for a Friday 13th. And please forgive me – I wanted to put this on the blog earlier as two of my customers hit this weeks ago already. But it must have fallen through the cracks. Still, now it is hopefully not too late to tell you what you should do if you hit ORA-29702 – and your instance does not startup in the cluster anymore. Especially when you tested a database upgrade – and after a restore, the database doesn’t want to start, no matter what you try.

Photo by Nadine Shaabana on Unsplash
What happens?
You upgrade Grid Infrastructure / Oracle Clusterware (for instance to to 19.7 or 19.8 but other versions may be affected as well).
Then you do a database upgrade, most likely of an 11.2.0.4 database but it could happen also with a 12.1.0.2 database or a different version. After the upgrade completed, you revert to your state “before upgrade” as you’d like to test it again. And it doesn’t matter if you do a “autoupgrade.jar -restore” or if you do a Flashback Database to a GRP or if you restore your database backup.
Whatever you do trying start a database with the same name again, it will fail with:
SQL> startup nomount ORA-29702: error occurred in Cluster Group Service operation
Now you investigate in the alert.log – and there you’ll find something like this snippet:
USER (ospid: 7777): terminating the instance due to error 29702 Instance terminated by USER, pid = 7777
You then check the Clusterware status and see:
InstAgent::startInstance 170 ORA-29701 or ORA-29702 or ORA-46365 shutdown abort ORA-29702: error occurred in Cluster Group Service operation InstAgent::startInstance 160 ORA-29701 or ORA-29702 or ORA-46365 instance dumping m_instanceType:1 m_lastOCIError:29702
Further investigation brings you to the cluster’s alert.log:
2020-10-13 15:02:31.596 [ORAAGENT(267248)]CRS-5017: The resource action "ora.abcdefg.db start" encountered the following error: 2020-10-13 15:02:31.596+ORA-01034: ORACLE not available
And I’d guess, now you’ll be quite “excited”.
How do you solve this problem?
The problem is happening because of Bug 31561819 – Incompatible maxmembers at CRSD Level Causing Database Instance Not Able to Start. And to be honest, you don’t need to even restore or flashback a database to hit this error. A simple instance in NOMOUNT state leads to the same error. Without even any datafile.
The bug is fixed from these RUs on:
- 19.9.0.0.201020 (Oct 2020) OCW RU
- 18.12.0.0.201020 (Oct 2020) OCW RU
- 12.2.0.1.201020 (Oct 2020) OCW RU
You can download the fix for lower-version platforms as well, but it seems to include the entire stack then.
As far as I can see, there is no MOS note about it. But since I worked with the two customers who hit this issue a while ago, and on whose behalf the bug has been filed and fixed, I receive now once every week an email from a customer running into this problem.
Further Information and Links
- MOS Note:31561819.8 – Bug 31561819 – Incompatible maxmembers at CRSD Level Causing Database Instance Not Able to Start
- Fix for Bug 31561819
–Mike
Hi Mike,
We run into same problem few months ago on X8M downgrading from 12.2 to 11.2 . We used kind of workaround for it with cloning RDBMS binaries and relinking with rac_off option which helped us to startup at least for Oracle One Node RAC instances.
Best Regards,
GK
This is not solutions for life production db
Of course, you should patch upfront.
And you would discover and hit this issue already when you test, and not at first in production.
Cheers,
Mike
We had similar issue . After changing db_unique_name to diff value we were able to start the Database .
Yes, but this will work.
Cheers,
Mike
Hi Mike,
Now it is really worrisome…I fixed this bug by applying this patch on Grid and RDBMS home. Next time when i tested again upgrade and restored it back, I got again hit by same bug…
look at this :
10:37:05 oracle@dexb501:+asm1 % ./opatch lsinventory | grep 31561819
Patch 31561819 : applied on Mon Feb 08 20:13:56 JST 2021
Patch description: “OCW Interim patch for 31561819”
31561819, 25736599, 26675491, 27148384, 27222128, 27572040, 27604329
10:37:24 oracle@dexb501:prise101 % cd /u01/ORACLE/baseDB/product/12.1.0.2/homeDB_rise/OPatch
10:37:35 oracle@dexb501:prise101 % ./opatch lsinventory | grep 31561819
Patch 31561819 : applied on Mon Feb 08 17:05:42 JST 2021
Patch description: “OCW Interim patch for 31561819”
21519340, 20218012, 21222147, 31561819, 26884984, 19551830, 19068333
SQL> startup nomount;
ORA-29702: error occurred in Cluster Group Service operation
alert log:
USER (ospid: 370524): terminating the instance due to error 29702
Thu Feb 25 10:07:02 2021
Instance terminated by USER, pid = 370524
What should we do , even i relinked the rdbms home, still the same..
Regards,
Shah FIrdous
Hi Shah,
sorry to see that – but you please need to open an SR and check with Oracle Support.
Cheers,
Mike
Hi Mike,
Even if you apply patch …first time this workaround works, but next time you will face same issue, as i tested upgrade and downgrade twice..
I found a solution to it..
RebOOT ALL NODES ONE BY ONE and then try it just works
ORACLE instance started.
Total System Global Area 2.8991E+10 bytes
Fixed Size 6873256 bytes
Variable Size 5100277592 bytes
Database Buffers 2.3689E+10 bytes
Redo Buffers 194449408 bytes
Regards,
SHAH FIRDOUS
Hi Shah,
thanks – but I think the two customers I worked with which hit this issue at first both didn’t have to reboot their servers but restart the entire clusterware stack instead.
Cheers,
Mike
Hi Mike,
Thanks for your blog.
We run into the same problem last weekend, upgrading a rac database from 12.1 to 19c. We had problems with the upgrade and when try to rollback and start the database with 12.1 we had hit the ORA-29702
The workaround that worked for us was to change the db_unique_name of database.
After the change of the db_unique_name, we were able to start the database with the same db_name in the 12.1 rdbms.
Hi Jose,
this is expected. But when you do for instance upgrade tests multiple times, then you can’t simply change the db_unique_name.
But you are right, changing the db_unique_name would be a workaround.
Cheers,
Mike