When Fixed Object Stats Gathering takes very long during upgrade …

Over the past days I exchanged several emails with a very important customer regarding Fixed Objects Stats gathering in relation to database upgrade. The customer found out that it takes very long in their environment – over an hour to be precise. Relaxing on the weekend helped me a bit. And this morning, it dawned me that I’ve had two similar cases before already. Plus I didn’t blog about it yet. Now it’s time to tell you a little but about Unified Auditing and the situation when Fixed Object Stats Gathering takes very long during upgrade …

When Fixed Object Stats Gathering takes very long during upgrade ...

Photo by Ayse A on Unsplash

Why “Fixed Objects Stats”?

Fixed objects are essentially X$ tables or structures. And plenty of them are dynamic. Gathering stats makes only real sense when the database is warmed up, the application is running – and after several hours or even a few days, you trigger the stats collection. Please be aware that the database does not track staleness Fixed Objects Statistics. Hence, it is either your duty as a DBA to refresh them from time to time. Or rely on the ability and available resources of the automatic stats gathering job.

Fixed Objects Stats are collected when the database gets created. And sometimes, when you refresh them, problems go away. But unfortunately not all of them.

We gather statistics on fixed objects before upgrade. This is usually representative as the database and the application(s) is/are operating for a while. But indeed, gathering statistics on fixed objects right after the upgrade doesn’t guarantee representative values for all fixed objects. It requires the application to be up and running. Hence, we recommend for ages to refresh Fixed Objects Statistics a week or so after the upgrade.

Thanks to Mike Hallas for his explanations and insights.

Please see also Daniel’s blog post about how to gather fixed objects stats efficiently after the upgrade with AutoUpgrade.

What means “long”?

It may be very relative what “long” means. But based on experience, I’d say that everything between 1 and 5 minutes is pretty fine, even a few more minutes may be tolerable. But when the stats gathering for fixed objects consumes now 30 minutes or an hour, or even more, than something is wrong. In the cases I came across in the past months, AutoUpgrade failed in two cases as it ran into a timeout when it hit 12 hours!

2020-07-07 13:30:21.831 INFO Executing SQL [DECLARE
BEGIN
  SYS.DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
/
] in [edicomdcon, container:EDICOMD] - ExecuteSql$SQLClient.run 
2020-07-08 01:29:24.696 WARNING This thread was interrupted while waiting for the following query to finish [DECLARE
BEGIN
  SYS.DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
/
] [EDICOMD] - ExecuteSql.quickSQL 
2020-07-08 01:29:24.714 ERROR 
============================ check info ============================
[EDICOMD][POST_FIXED_OBJECTS][RECOMMEND]
============================ check info ============================
=========================== trace start ==============================
Exception: SQLException
Err message: Process abort was requested for query.
oracle.upgrade.commons.sql.ExecuteSql.quickSQL(ExecuteSql.java:631)
oracle.upgrade.commons.sql.ExecuteSql.quickSQL(ExecuteSql.java:571)
oracle.upgrade.commons.dbinspector.tools.Check.runFix(Check.java:199)
oracle.upgrade.commons.dbinspector.tools.FixUpTrigger.executeFixUp(FixUpTrigger.java:105)
oracle.upgrade.commons.dbinspector.tools.FixUpTrigger.call(FixUpTrigger.java:75)
oracle.upgrade.commons.dbinspector.tools.FixUpTrigger.call(FixUpTrigger.java:40)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

============================ trace end ===============================
 - FixUpTrigger.executeFixUp 
2020-07-08 01:29:24.716 INFO Finished FIXUP [POST_FIXED_OBJECTS][EDICOMD][RUNTIME_ERR] - FixUpTrigger.executeFixUp

So THIS is long for sure. I guess you’ll agree with me.

What can you do in such cases?

The first remedy in such cases is to skip the fixups AutoUpgrade does. This is not complicated – and of course the upgrade will progress than without the stats gathering you’ve skipped. It may take a bit longer compared to a case where stats have been refreshed beforehand. But of courser in cases where the Fixed Objects Stats gathering takes hours, that is certainly acceptable by you I’d assume.

Please read on here in case you need to skip stats gathering (dictionary and/or fixed objects stats) during AutoUpgrade:

As Daniel has proven now, with newer versions of AutoUpgrade, you indeed can skip fixups PRE and POST upgrade.

And actually with the most recent version of AutoUpgrade, we don’t do the Fixed Objects Stats gathering after upgrade anymore.

But what is the root cause?

You still may ask yourself: Why is he writing so many words instead of explaining the root cause here?

There is a simple reason for this. At first, there is no silver bullet. But there is a typical pattern we came across. I just don’t want to put out the word stating “this is the case and there are no other possibilities”. So please, take this as a potential solution, and not as a universal. You still may please need to open an SR and check with Oracle Support in such cases.

When another customer mailed me in July wondering about the 12 hours fixed objects stats gathering from the log snippet above, he pointed me to this MOS note:

Unfortunately the MOS note does not contain any reference to a bug or any sort of background information. But the relation to Unified Auditing is quite interesting. It draws a line to “many” audit records in the audit trail:

SELECT COUNT(*) FROM unified_audit_trail;

and the duration of gathering fixed objects stats.

Interestingly enough, at the almost exact same time I exchanged emails with a banking customer in South America facing the exact same problem, but in this case with an upgrade to 18c, not to 19c.

And as I started writing above, last week one of our VIP customers wondered as well about gathering fixed objects stats before and after the upgrade, taking more than 1 hours each way.

In this case, this lead to:

2020-10-17 14:00:04.002 INFO ----------------------Stages  Summary------------------------ - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     SETUP             <1 min                                  - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     GRP               <1 min                                  - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     PREUPGRADE        <1 min                                  - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     PRECHECKS         <1 min                                  - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     PREFIXUPS         5 min                                   - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     DRAIN             <1 min                                  - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     DBUPGRADE         57 min                                  - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     POSTCHECKS        1 min                                   - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     POSTFIXUPS        126 min                                 - DispatcherOSHelper.writeStageSummary
2020-10-17 14:00:04.002 INFO     POSTUPGRADE       1 min                                   - DispatcherOSHelper.writeStageSummary

You recognize quickly that the POSTFIXUPS part has taken longer than the actual upgrade itself. And it was one single PDB causing this trouble spinning on ASM related X$ tables:

  • X$KFBH describes the ASM cache (buffer cache of ASM in blocks of 4K (_asm_blksize))
  • X$KFFIL lists ASM files, including metadata/asmdisk files

Which is even more strange as the upgrade does not happen in an ASM instance. But the database is a RAC database being in ASM of course.

You can check by yourself (I’m pasting the customer’s query here – thank you, Ändu!):

SELECT v.name, ts.analyzetime
FROM V$FIXED_TABLE v, SYS.TAB_STATS$ ts
WHERE v.object_id = ts.obj# and analyzetime > sysdate -0.5
order by analyzetime desc
;

So there does not seem to be this one single root cause. Even though this customer is a heavy user of Unified Auditing as well.

Purging the audit records seems to be a good idea to have less time spent on gathering fixed objects stats:

BEGIN
DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL(
audit_trail_type => dbms_audit_mgmt.audit_trail_unified,
use_last_arch_timestamp => FALSE);
end;
/

But personally I’d consider this only as a workaround but not as a real solution.

Further Information and Links

–Mike

Share this: