Binary patching is slow because of the inventory

The headline of this blog post may sound a bit cryptic: Binary patching is slow because of the inventory. But actually this seems to be one of the main causes when your binary patching gets slower and slower with every patch you apply.

The more you patch, the worse it gets

Well, this is a tough headline. But you may have read my blog posts about my quarterly patching experience. It actually started with the July 2021 RU in my 19c environment when I wrote:

At least in my environment, both checks with the new opatch 25 take an awful long time. I think I waited 3 minutes for each check to return back to the command prompt. And since opatch does the checks again when I call the apply in the next stage, the wait time happens again when it tells me “Verifying environment and performing prerequisite checks…“.

I didn’t investigate much further as you can see in my blog post from July 2021. But I saw the same behavior in the following patch cycles for October 2021 and January 2022. And of course, in April 2022 it were now three times 5 minutes and more of waiting time.

Now you’d say: Who cares about five minutes more or less? But keep in mind that I patch only the database home, no GI patching involved. And furthermore, I have just a toy environment where I patch quarterly.

From customer’s feedback such as Peter’s nailed down in:

Bug 33425636 – OPATCHAUTO VER 27 BINARIES APPLY TAKES LONG TIME 19.12 Sev 1 SR

it clearly looked like as if the more patches you’ve applied, the longer it finally takes.

Which is bad since we ask you to patch quarterly. And it would mean that you would suffer more from this effect the more often you patch your database.

 

A larger development test

Since we’ve had now enough information from various customers we could discuss this based on a lot of data in development. My mates started a large test with a multiple-node RAC environment and a CDB with many PDBs.

As of now, there were several effects reported by several customers:

  • The more patches you have, the longer the binary patching takes
  • The more cluster nodes you have, the longer it takes
  • The more databases you run on each cluster node, the longer it takes
  • The more PDBs you have, the longer it takes

And even though some of the above statements seem to be obvious, the effect in some cases was “interesting”.

Since this blog post here is only about the binary patching, I will divert the other questions (and hopefully answers) to additional blog posts in this series.

Back to the development test runs, it became clear that we can’t see the drastic effect for the binary patching. Instead, datapatch seemed to have a bigger effect. So while customers such as Peter or Sven had terrible and longer patching “experiences”, we internally didn’t see this. At least not as impacting as with the customer environments.

 

I’m not alone

Luckily, there are very smart people out there. And some read my blog.

I quoted Balaji Govindu’s blog post on LinkedIn before already (you need to have a LinkedIn user to see it). But let me describe what he did since he saw the same boring messages in his patch logs I saw: He edited a file in the inventory keeping track of the applied patches on the system: $ORACLE_HOME/inventory/ContentsXML/oui-patch.xml. And by doing so, he could drastically reduce the binary patching time.

It seems to be obvious when you tail -f the logs during a patch run. Opatch is checking for dependencies. And unfortunately it looks like that with every other patch the check of dependencies takes factor X longer than before. So it is not just “one single check more” but a multiplier. The more you patch, the worse it gets.

 

A quick comparison

So before I show you the workaround, I will test this in my toy environment, our Hands-On Lab. The setup is a 19.14.0 database home with OJVM installed. The target will be 19.15.0. I will do two runs and compare the timings and some information from the logs.

Run 1 – no changes to oui-patch.xml

When I do the checks for space and conflicts at first, I already recognize a long runtime.

$ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -ph ./

and

$ORACLE_HOME/OPatch/opatch prereq CheckSystemSpace -ph ./

each take several minutes. To be precise, each of the checks in my tiny little playground environment takes 5 min 29 seconds. Each of them of course. So you need to multiply the wait time by two. Hence, these two simple prechecks run 11 minutes together. Keep in mind at this stage that I patch only my database home. No Grid Infrastructure is involved which has the exact same checks with maybe less patches.

In the log file which I tailed in parallel, I see the following messages – and then for several minutes … nothing.

[May 9, 2022 6:23:41 PM] [INFO]     [OPSR-TIME] Loading raw inventory
[May 9, 2022 6:23:42 PM] [INFO]     [OPSR-MEMORY] Loaded all components from inventory. Heap memory in use: 25 (MB)
[May 9, 2022 6:23:42 PM] [INFO]     [OPSR-MEMORY] Loaded all one offs from inventory. Heap memory in use: 34 (MB)
[May 9, 2022 6:23:42 PM] [INFO]     [OPSR-TIME] Raw inventory loaded successfully
[May 9, 2022 6:23:49 PM] [INFO]     [OPSR-TIME] Loading cooked inventory
[May 9, 2022 6:23:49 PM] [INFO]     [OPSR-MEMORY] : Loading cooked one offs. Heap memory used 848 (MB)

[May 9, 2022 6:23:56 PM] [INFO]     [OPSR-MEMORY] : Loaded cooked oneoffs. Heap memory used : 682 (MB)
[May 9, 2022 6:23:56 PM] [INFO]     [OPSR-TIME] Cooked inventory loaded successfully
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 29517242 is a subset of other poh CUP: 33192793
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 29517242 is a subset of other poh CUP: 33515361
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 30557433 is a subset of other poh CUP: 30869156
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 30557433 is a subset of other poh CUP: 31281355
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32218454 is a subset of other poh CUP: 32545013
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32218454 is a subset of other poh CUP: 32904851
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32218454 is a subset of other poh CUP: 33192793
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32218454 is a subset of other poh CUP: 33515361
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32545013 is a subset of other poh CUP: 32904851
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32545013 is a subset of other poh CUP: 33192793
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32545013 is a subset of other poh CUP: 33515361
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32904851 is a subset of other poh CUP: 33192793
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 32904851 is a subset of other poh CUP: 33515361
[May 9, 2022 6:23:56 PM] [INFO]     CUP_LOG: Found poh CUP 33192793 is a subset of other poh CUP: 33515361

You recognized the patch IDs of my previously applied patches. I silently assume that opatch now tries to sort out dependencies and potential conflicts.

Of course, I can do nothing than waiting for it to complete. Unfortunately these checks run again later when I invoke opatch to apply the patch bundle. And now you can do the math what happens on a 4-node cluster. Yes, opatch will do this exercise over and over again, node by node.

And please don’t ask me why this takes so hilariously long. In an internal discussion a colleague said: “Well, you’d think that if the data is in memory it should be processes within a few seconds.”. Hm …

And it happens again

Now when I apply the patch, oh wonder, opatch does the same checks again by itself. So I wonder why I’m always following the README and run the checks when opatch is just repeating them. Does it expect my disk to be full a few minutes later? Or does it expect that I secretly applied several other patches in between?

Anyhow, you know it already – my log file looks the same from start with another 5.5 minutes of waiting time when this happens:

$ $ORACLE_HOME/OPatch/opatch apply
Oracle Interim Patch Installer version 12.2.0.1.30
Copyright (c) 2022, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/19
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/19/oraInst.loc
OPatch version    : 12.2.0.1.30
OUI version       : 12.2.0.7.0
Log file location : /u01/app/oracle/product/19/cfgtoollogs/opatch/opatch2022-05-09_18-29-45PM_1.log

Verifying environment and performing prerequisite checks...

Here I sit and wait again.

And then, finally, opatch applies the patch bundle to my existing home.

So let me summarize this quickly:

  • Precheck manually for conflicts: 5 minutes 29 seconds
  • Precheck manually for space: 5 minutes 26 seconds
  • Precheck done by opatch during “apply”: 5 minutes 29 seconds
  • Raw patch apply duration for 19.15.0:
    3 minutes 58 seconds including copying files for rollback

That doesn’t look pleasant. Over 16 minutes waiting for checks versus less than 4 minutes for the actual patching activity.

Something seems to be wrong here. Especially since I can’t see the same behavior in my 21c or my 12.2.0.1 homes despite the fact that I use the exact same opatch version in all three homes and my 12.2.0.1 home has seen almost as many patch bundles as my 19c home has gotten.

This is not the end of the story

I’d like to apply the OJVM bundle from April 2022 as well. Hence, I need to run the checks again according to the README.

You may guess what happens again? Of course, opatch mows through the same inventory and does the exact same checks again. And they take as long as the previous ones with one correction: Since my source home now contains the database RU 19.15.0 already, there is one more patch bundle than before.

I save you the time and reveal the result. 5 minutes 52 seconds now for every check. Times three since I do a conflict check, a space check – and even if I’d save both and leave it to opatch, it will do the same check again. Over 18 minutes for checks now, or at least almost 6 minutes during the apply run.

The actual patching of the OJVM bundle itself took only 52 seconds.

 

The workaround

Ok, enough read about how long it takes and what potentially went wrong or is not optimal. You came here to read about the workaround. And the effect it has. So let us start here with some editing work. And keep one important thing in mind: This is NOT an officially supported workaround from Oracle. This is just a hint how you could improve patching duration. Always keep the original oui-patches.xml file. And be aware that it didn’t get updated when you patch with a edited copy of it.

In order to emphasize this, the file even contains a header line telling you:

<!-- Copyright (c) 2022 Oracle Corporation. All Rights Reserved. 
 Do not modify the contents of this file by hand.

I guess now it is clear that you won’t convince Oracle Support to approve manual edits.

Run 2 – manual edit of oui-patch.xml

Let me wipe out opatches’ memory a bit. Why does opatch need to check for a patch which I removed already months ago from my hard disk?

My $ORACLE_HOME/inventory/ContentsXML/oui-patch.xml has 1001 lines at the begin of my little experiment.

You will see that the structure is simple. A patch bundle is enclosed with XML tags between:

<ONEOFF REF_ID="30125133" UNIQ_ID="23151502" TYPE="upgrade" LANGS="en" ROLLBACK="F" XML_INV_LOC="oneoffs/30125133/" ACT_INST_VER="12.2.0.7.0" INSTALL_TIME="2019.Oct.16 19:27:15 CEST">
<DESC>Database Release Update : 19.5.0.0.191015 (30125133)</DESC>
<REF_LIST>

...

</BUG_LIST>
<FILE_LIST/>
<PROPERTY_LIST/>
</ONEOFF>

I will carefully remove now all blocks except the block(s) for 19.14.0 patches.

A very important annotation since I received a comment on the blog:
When you have one off and/or merge patches for this specific RU in your XML file, then please leave them and don’t wipe them out. Only remove entries which are older. So I consider for instance “n-1” as 19.14.0 with the OJVM 19.14.0 and with one-off patches on top of it.

As a result, my file contains only 136 lines and just two patch bundles, the 19.14.0 RU and the 19.14.0 OJVM:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright (c) 2022 Oracle Corporation. All Rights Reserved. 
 Do not modify the contents of this file by hand. 
-->
<ONEOFF_LIST>
<ONEOFF REF_ID="33197296" UNIQ_ID="24445578" TYPE="unknown" LANGS="en" ROLLBACK="F" XML_INV_LOC="oneoffs/33197296/" ACT_INST_VER="12.2.0.7.0" INSTALL_TIME="2021.Dec.16 09:42:01 CET">
<DESC>JDK BUNDLE PATCH 19.0.0.0.211019</DESC>
<REF_LIST>
	<REF NAME="oracle.jdk" VER="1.8.0.191.0"/>
	<REF NAME="oracle.jdk" VER="1.8.0.201.0"/>
</REF_LIST>
<BUG_LIST>
	<BUG>33197296</BUG>
</BUG_LIST>
<FILE_LIST/>
<PROPERTY_LIST/>
</ONEOFF>
<ONEOFF REF_ID="33515361" UNIQ_ID="24589353" TYPE="upgrade" LANGS="en" ROLLBACK="F" XML_INV_LOC="oneoffs/33515361/" ACT_INST_VER="12.2.0.7.0" INSTALL_TIME="2022.Jan.19 21:37:55 CET">
<DESC>Database Release Update : 19.14.0.0.220118 (33515361)</DESC>
<REF_LIST>
	<REF NAME="oracle.help.ohj" VER="11.1.1.7.0" PATCH_LEVEL="3"/>
	<REF NAME="oracle.perlint" VER="5.28.1.0.0" PATCH_LEVEL="3"/>
	<REF NAME="oracle.rdbms.locator" VER="19.0.0.0.0" PATCH_LEVEL="3"/>
	<REF NAME="oracle.perlint.expat" VER="2.0.1.0.4" PATCH_LEVEL="3"/>
	<REF NAME="oracle.rdbms.rsf" VER="19.0.0.0.0" PATCH_LEVEL="3"/>

... here I cut several lines to shorten it only for display - these lines were left in the file I used ...

	<REF NAME="oracle.precomp.lang" VER="19.0.0.0.0" PATCH_LEVEL="3"/>
	<REF NAME="oracle.jdk" VER="1.8.0.191.0" PATCH_LEVEL="3"/>
	<REF NAME="oracle.jdk" VER="1.8.0.201.0" PATCH_LEVEL="3"/>
</REF_LIST>
<BUG_LIST>
	<BUG>7391838</BUG><BUG>8460502</BUG><BUG>8476681</BUG><BUG>14570574</BUG><BUG>14735102</BUG><BUG>15931756</BUG><BUG>15959416</BUG><BUG>16662822</BUG><BUG>16664572</BUG><BUG>16750494</BUG><BUG>17275499</BUG><BUG>17395507</BUG><BUG>17428816</BUG><BUG>17468475</BUG><BUG>17777718</BUG><BUG>18534283</BUG><BUG>18697534</BUG><BUG>19080742</BUG><BUG>19138896</BUG><BUG>19697993</BUG><BUG>20007421</BUG><BUG>20083476</BUG><BUG>20313356</BUG><BUG>20319830</BUG><BUG>20479545</BUG><BUG>20867658</BUG><BUG>20922160</BUG><BUG>21119541</BUG><BUG>21232786</BUG><BUG>21245711</BUG><BUG>21374587</BUG><BUG>21528318</BUG><BUG>21629064</BUG>

... here I cut over 100 lines of bug numbers only for display - those were left in the file I used ...

<BUG>33482590</BUG><BUG>33489699</BUG><BUG>33490031</BUG><BUG>33500486</BUG><BUG>33507610</BUG><BUG>33507953</BUG><BUG>33514179</BUG><BUG>33515115</BUG><BUG>33540746</BUG><BUG>33558058</BUG><BUG>33558251</BUG><BUG>33558391</BUG><BUG>33559316</BUG><BUG>33563167</BUG><BUG>33599734</BUG><BUG>33613512</BUG><BUG>33632051</BUG><BUG>33635094</BUG><BUG>33651003</BUG><BUG>33656104</BUG><BUG>33656608</BUG><BUG>33674035</BUG><BUG>33661960</BUG><BUG>30166257</BUG><BUG>33384092</BUG><BUG>33618962</BUG>
</BUG_LIST>
<FILE_LIST/>
<PROPERTY_LIST/>
<SUPERSEDES>
	<REF_ID REF_ID="33192793" UNIQ_ID="24462514" LANGS="en"/>
</SUPERSEDES>
</ONEOFF>
<ONEOFF REF_ID="33561310" UNIQ_ID="24538862" TYPE="unknown" LANGS="en" ROLLBACK="F" XML_INV_LOC="oneoffs/33561310/" ACT_INST_VER="12.2.0.7.0" INSTALL_TIME="2022.Jan.19 22:12:28 CET">
<DESC>OJVM RELEASE UPDATE: 19.14.0.0.220118 (33561310)</DESC>
<REF_LIST>
	<REF NAME="oracle.javavm.server" VER="19.0.0.0.0"/>
	<REF NAME="oracle.javavm.server.core" VER="19.0.0.0.0"/>
	<REF NAME="oracle.rdbms.dbscripts" VER="19.0.0.0.0"/>
	<REF NAME="oracle.rdbms" VER="19.0.0.0.0"/>
	<REF NAME="oracle.javavm.client" VER="19.0.0.0.0"/>
</REF_LIST>
<BUG_LIST>
	<BUG>29445548</BUG><BUG>29254623</BUG><BUG>29540327</BUG><BUG>29774362</BUG><BUG>30134746</BUG><BUG>30160625</BUG><BUG>30534662</BUG><BUG>29512125</BUG><BUG>29942275</BUG><BUG>30855101</BUG><BUG>31306261</BUG><BUG>31359215</BUG><BUG>30895577</BUG><BUG>29224710</BUG><BUG>26716835</BUG><BUG>31668872</BUG><BUG>32165759</BUG><BUG>32069696</BUG><BUG>32032733</BUG><BUG>30889443</BUG><BUG>30674373</BUG><BUG>32167592</BUG><BUG>32523206</BUG><BUG>29415774</BUG><BUG>28777073</BUG><BUG>32124570</BUG><BUG>31247838</BUG><BUG>29540831</BUG><BUG>32892883</BUG><BUG>31776121</BUG><BUG>33223248</BUG><BUG>33563137</BUG><BUG>33184467</BUG><BUG>31844357</BUG>
</BUG_LIST>
<FILE_LIST/>
<PROPERTY_LIST/>
</ONEOFF>
</ONEOFF_LIST>

The results after wiping out opatches’ memory

You’ll be surprised … NOT … I guess.

Applying the database 19.15.0 RU to my environment with all the checks run manually upfront took:

  • Precheck manually for conflicts: 9 seconds
  • Precheck manually for space:  8 seconds
  • Precheck done by opatch during “apply”: 12 seconds
  • Raw patch apply duration for 19.15.0:
    3 minutes 51 seconds including copying files for rollback

I guess you’ll have no further questions anymore.

For the OJVM RU in addition, the numbers for the checks went up by 1-2 seconds. Both checks took 10 seconds each.

And the final patch apply took end-to-end 1 minute 11 seconds including everything.

So in sum, patching my environment with 19.15.0 RU Database and 19.15.0 OJVM took less than 6 minutes. That is roughly the time a single check on my system took before with the 1001 line long patch inventory file.

 

The n-1 discussion … revisited

As you read potentially already in the initial blog post for this series, I always get thrown back on the n-1 topic. Since opatch does not offer a possibility at the moment to just purge everything older than n-1 patches, e.g. keeping only the current patch information such as April 2022 and the previous patch bundle, January 2022, but getting rid of everything older than January 2022, we help ourselves here.

Basically, the workaround does exactly do what a n-1 purging mechanism would do: Wipe out opatches’ memory and just keep information about the previous patch.

Now I discussed this with the opatch team as well. And I’ve been told that it is not trivial to implement such a purge behavior. But the team will discuss and consider it.

I’m still very positive that it will come eventually the sooner or later.

Just keep in mind when you use the workaround to protect your previous oui-patches.xml file.

 

Customer experience

I’d like to share the results from Sven since he redid and compared the runs, without changes to the XML file and with my proposed changes. Before, the patching times were the worst I have ever seen. Sven is on SPARC Solaris here, and the below times are just for 19.15.0 for the database home.

  1. Test run 1 with opatch 12.2.0.1.29 (Jan 22 opatch)
    Time taken to complete the session 146 minutes, 11 seconds
  2. Test run 2 with opatch 12.2.0.1.30 (Apr 22 opatch)
    Time taken to complete the session 111 minutes, 51 seconds
  3. Test run 3 with opatch 12.2.0.1.30 and oui-patch.xml edit to n-1
    Time taken to complete the session 40 minutes, 58 seconds

So you recognize two things:

  1. There is a dependency on the opatch version (future blog post)
  2. There is a significant improvement with a n-1 inventory file
    .

Summary

Let me summarize this tiny little exercise here.

At first, it is clear that the manual edit of $ORACLE_HOME/inventory/ContentsXML/oui-patch.xml is not supported. But it is VERY efficient to do it. But keep in mind that you will see this problem I described above only with in-place patching. When you do out-of-place patching as we recommend, you will start with a fresh vanilla $ORACLE_HOME/inventory/ContentsXML/oui-patch.xml. Hence, this issue won’t affect you at all.

But in case you are suffering from long binary patching times, check your logs. Even when you are not doing the prechecks beforehand. Especially in RAC environments where more than one node gets patches, and especially in environments where you patch not only the database but also the Grid Infrastructure, you may want to check your opatch logfiles. opatch always tells you the location of the logs when you call opatch. And you will always find them in $ORACLE_HOME/cfgtoollogs/opatch.

The result above from my tiny little test environment is impressive. I compare more than 16 minutes patching duration with the large xml file to 5.5 minutes with a reduced  file – even when I don’t run the checks beforehand and leave it to opatch. The result gets worse when I run all the recommended checks manually by myself.

Of course, development has taken notice of this as well. And I guess, the sooner or later we may see a fix or a purge mechanism. But since this tool is not owned by our team, I can’t promise anything.

Finally, let me please ask you NOT to shoot the messenger. Thanks 🙂

 

Further Links and Information

–Mike

Share this: