I love visiting customers onsite. Last week I visited die Mobiliar in Bern, Switzerland. I received a list of open issues to discuss – which is very good to prepare a visit. And when we all were sitting together there was this “Ah, one final thing”. They have an issue with traces the databases writes every few seconds. As a remedy the DBAs increased the backup interval to remove the traces as otherwise the system would potentially run out of inodes or space. All the traces had the same pattern. And I learned quickly: these are MMON unconditional traces in Oracle 22.214.171.124.
MMON unconditional traces in Oracle 126.96.36.199
This happens sometimes. And it is not nice. A developer forgets either to remove a debug or trace event. Or the condition is not set correctly. Whatever is the case here, MMON (Manageability Monitor background process) writes an unconditional trace every 3 seconds. “Unconditional” means: it writes it every 3 seconds no matter what is happening.
AUTO SGA: kmgs_parameter_update_timeout gen0 0 mmon alive 1 AUTO SGA: kmgs_parameter_update_timeout gen0 0 mmon alive 1 dbkrfssm: mmon not open dbkrfssm: mmon not open *** 2017-08-16T16:38:23.916367+02:00 (CDB$ROOT(1)) AUTO SGA: kmgs_parameter_update_timeout gen0 0 mmon alive 1
This is known as
Bug 25415713 - MMON TRACE FILE GROWS WHEN NO TRACES ARE ENABLED. And the solution is to apply the patch for it.
As I was researching this for die Mobiliar I was wondering why such an issue – despite being known for over 6 months – is not included into any Update yet. After checking with the responsible Patching people somebody recognized that the patch was classified falsely as “non rolling installable” and therefore not considered for inclusion into Updates and Revisions. This got changed now and the fix should be included into the October 2018 Update. For July 2018 the code freeze unfortunately has been passed already.
In addition, the Note I had initially bookmarked, 2298766.1, has been archived or deleted. Instead this MOS Note has the details:
- MOS Note: 2319931.1
MMON trace file grows and is flooded with: kmgs_parameter_update_timeout Gen0 messages
- Die Mobiliar Blog: Why does my database generate so many traces?
- Patch 25415713
Update: May 24, 2018
As you can see in the COMMENTS sections, Ross mentioned that he received – different than Mobiliar – hugely growing traces instead many of them. The difference may be that Mobiliar is using Multitenant but I didn’t dig down deeply enough into it.
12.2 and 18 have bugs. Last year, we were planning to upgrade to 188.8.131.52. Data Pump export/import was chosen as the method because we wanted to clean out data dictionary which has accumulated lot of junk over last more than a decade. So, I chose a database which has most feature usage. The test database has 60k user accounts, the production has 84k. When import came to ROLE_GRANTS stage, it just seem to get hung. Attaching to the job, I realized that it was processing role grants very slowly. There are 291k role grants which were taking 18+ hours. In 11g and 12.1, they were taking about half hour.
I opened a SR. There is no need to describe the torture I suffered at the hands of analysts. My observation was that when SYSAUTH$ table had about 60k-80k rows or more, each role grant took nearly 250ms. It took 5+ months of literally a battle with support to get a fix. They provided a fix, but in 18c. Bug 26354448. By the time, the fix was provided, we had missed our deadline for the upgrade, so we decided to wait for 18c.
The moment 18c was released, I tested the same import from last year and found that the 291k role grants were taking only half hour. But the happiness was short lived. Few steps later, the import got stuck at PROCACT_SCHEMA stage. It would not complete in 48+ hours. SR is open for more than a month. I have provided a reproducible test case to support, but they have failed to provide any import log to show that they even ran the test case.
This time the problem seems to be in the unified auditing. If turn it off, then under certain conditions, the PROCACT_SCHEMA stage completes in about 20-25 minutes. Support’s only responses are turn this feature off, turn this setting off. Why would I have to turn off Oracle supplied feature like unified auditing or settings like parallelism and cluster in import? They work just fine in 11g. I would expect newer releases to improve the performance, not degrade it.
This is the reason annual releases are a bad idea. Developers will be under pressure and they will release buggy code. This is also explains why some customers are reluctant to upgrade.
I dropped you an email regarding the SR numbers. I’d like to have the chance to look at them before I reply.
The post from Mobiliar explicitly mentions creating lots of files (which would explain the backups you mention). In our environment I see all the trace messages being written to to the gen0 trace file and a single mmon trace file, instead of creating multiple files. Might be helpful for others as they investigate this issue (and validate the fix) that it may not be multiple trace files, and just the regular trace file growing at an abnormal rate.
thanks for this hint. I think the difference is coming from the fact that Mobiliar uses Multitenant. But I didn’t verify the root cause. I’ll update the blog post.
Yeah, we are also using multitenant, although currently only two created PDBs in it in production. Have a dev 12.2 CDB with 4 created PDBs, and see the same behavior with a single gen0 and mmon trace file with the message.
Thanks for writing about this, definitely something we will keep an eye on.
I was one of the original customer that opened a SR about this
If you are able to read all SR please see this “SR 3-15159612741 : REpeated message in MMON and gen0 tracefile for ASM 12.2″ created June 2017 ( 1 year ago )
The oneoff was a real nightmare ( fortunately applied only in a test environment ).
This is because i tried to apply oneoff at Grid installation ( i discovered problem in ASM ) .
No problem or errors applying the oneoff with opatch but after apply GRID won’t restart with error : ” CRS-6706: Oracle Clusterware Release patch level (‘1424276726’) does not match Software patch level (‘2707839533’). Oracle Clusterware cannot be started. ”
We made different test with engeenering , at the end i asked to include this oneoff in official RU .. too dangerous oneoff for me to go in production !!!
thanks – and I think I read the SR already as it was noted in the bug. And I learned also about the “grid” issue 🙁
Thanks for your input – I can see the problem with the GIMRDB as well.
Patch 25415713 does not appear to be included in the October 2018 PSU. Is the codefix so that no separate patch number can be identified?
Please check with Support – I can’t verify this at the moment and have no control on what Sustaining Engineering is including.
Thanks and sorry for any inconvenience 🙁
My colleagues has installed the October 2018 PSU for 12.2 and noted that there is still written data in these trace files as before. Applying the 25415713 patch seems to solve it. But test it carefully before launching in production.
BR / Peter