The other day I received a question from a colleague about the risk of having GI and database both not being on the same RU. And a long while ago I blogged about it. We recommend that you keep it in synch. But you don’t have to. At the same time I received a wish from Ernst Leber to post something on the blog he and his colleague had trouble with when upgrading to 19c in a RAC environment. AutoUpgrade hung. But he titled his email with “not an AutoUpgrade problem!”. Still I agree, it is worth to write about the issue seen. And thanks, Ernst!

Photo by Zane Persaud on Unsplash
What’s the issue?
Ernst and his colleague attempted a RAC upgrade. The database upgrade had been attempted with AutoUpgrade. But it hung. So what happened?
Actually, they ran into a known issue: BUG 29580769 – LNX-193-AGENT: “SRVCTL MODIFY ASM -COUNT 3” COULD HANG AS LONG AS 11 MINS, AND CRS ORAAGENT.BIN WOULD COREDUMP AT THE SAME TIME.
Luckily they quickly found the MOS note giving advice on this issue:
- MOS Note: 2645911.1 ([srvctl stop database -database db_name] is hanging for 10 minutes)
- NOTE:29580769.8 – Bug 29580769 – “srvctl modify asm -count 3” Hang for 11 mins Coredump is generated
The issue is that a srvctl modify asm -count all hangs for 10 minutes. The instance connections weren’t closed, and as result, the shutdown hung. Workaround would be to kill it with “shutdown abort” in SQL*Plus.
When is this fixed?
For most of you, this won’t be an issue as the issue has been fixed from 19.10.0 Release Upgrade on. The fix is included in 19.10.0 and the following Grid Infrastructure RUs. And this is the reason for my lengthy entry paragraph. No blame to anybody, but IF Grid Infrastructure would have been patched to 19.10.0 or 19.11.0 (or newer in case you read this article after mid July 2021), then the issue wouldn’t have happened. Just saying that this is why you may want to consider patching GI on the same schedule as your database.
What is the workaround?
Now back to Ernst and his recommendation for the database upgrade in case you have GI 19.9.0 or lower. And this is actually the same recommendation you can read in MOS Note: 2645911.1.
- srvctl stop instance -db orcl -node node1
- Confirm the node1’s instance is stopped and the command (a,) has finished
- srvctl stop instance -db orcl -node node2
Ok, so far so good. But why does this appear on the Upgrade Blog?
AutoUpgrade in this case will start with the -analyze followed by the -fixups phase. It will collect stats and do all the other things, AutoUpgrade does to make your life easier. But when in enters the drain phase, it will give you an error as it will catch the timeout. AutoUpgrade will initiate the srvctl commands. But as the shutdown hangs, AutoUpgrade won’t progress for 10 minutes as well.
And in this case you may sit around and wonder. Or blame AutoUpgrade for doing nothing and just giving you a timeout error.
So the ultimate workaround is to have Grid Infrastructure patched to 19.10.0 or 19.11.0 or the matching RURs which contain the fix as well.
Additional Information – Data Guard
Ernst and his colleague sent me another question about whether this issue would affect a Data Guard switchover as well. I wasn’t sure so they tried it out by themselves. And unfortunately, the answer is “Yes”. So you need to patch GI to 19.10.0 or higher to avoid such issues.
8 minutes into the switchover, they received this message from the Broker:
DGMGRL> switchover to to19DG Performing switchover NOW, please wait... New primary database "to19dg" is opening... Oracle Clusterware is restarting database "to19" ... Unable to connect to database using to19 ORA-12514: TNS:listener does not currently know of service requested in connect descriptor Failed. Ensure Oracle Clusterware successfully restarted database "to19" before proceeding
You need to patch.
Further Links and Information
- Does GI RU/RUR have to match your database’s RU/RUR?
- AutoUpgrade Utility in Oracle 19c
- MOS Note: 2645911.1 ([srvctl stop database -database db_name] is hanging for 10 minutes)
- NOTE:29580769.8 – Bug 29580769 – “srvctl modify asm -count 3” Hang for 11 mins Coredump is generated
–Mike
hi mike, When will the db21c on-premise be released?
Hopefully soon 🙂
Cheers,
Mike
An off-topic question. Do we or anybody know an approximate ratio of the shops using data guard broker among all data guard shops? We’ve been using data guard for 10+ years and never used the broker. We know its benefit and know other shops like it, but we consider it an extra layer and a burden.
I have no numbers nor did we collect them.
But from my experience, I would guess more than 50% for sure since a lot of customers use OEM. And when you use OEM, you need the broker.
Cheers,
Mike
Actually, we’ve been using OEM for about 20 years. But we never use Broker.
That’s great. I’m not an OEM expert but customers told me that you can’t administer your standbys correctly in OEM in case the Broker is not present. You can see the targets, you can administer each DB individually, but all the Data Guard magic in OEM will require the Broker being used.
Cheers,
Mike
Oh I see what you mean. We use OEM to do normal monitoring of the databases, including the standby’s. But we use sqlplus to do switchover, stop/start recovery etc.
Hi all,
I have upgraded a DB from 11g to 19c. The job_queue_process parameter is set as 1000 in old and new versions. But still the Chain step is failing with following error:
CHAIN_LOG_ID=”140669208″, STEP_NAME=”ST_BZ_BRAND_PER_EXPORT”, REASON=”Stop job with force called by user: ‘SYS'”
–stop job is not manually run
–instance is not crashed
— no error/info is written in alert log
Any idea what is causing this.
Hi Bashkar,
no, unfortunately not. I would reduce job_queue_processes generally to equal cpu_count, or set it 2x cpu_count – but not higher. The default is fairly off, and I think in a later release we will change it. 1000 is way too high.
Cheers,
Mike