Differences between Automatic Statistics Gathering job and GATHER_SCHEMA_STATS

Recently a customer raised a question whether there are differences between the Automatic Statistics Gathering job and a manual creation of stats via the GATHER_SCHEMA_STATS procedure.

The results in performance were quite interesting. Performance after an upgrade from Oracle Database 11.2.0.3 to Oracle Database 11.2.0.4 was not good when the automatic stats job got used. But performance changed significantly to the better when schema stats were created with the downside of taking more resources during the gathering.

Is the Automatic Stats Gathering job enabled?

That question can be answered quite easily. There’s a very good MOS Note:1233203.1 – FAQ: Automatic Statistics Collection displaying this query:

SELECT CLIENT_NAME, STATUS FROM DBA_AUTOTASK_CLIENT WHERE CLIENT_NAME='auto optimizer stats collection';

The MOS Note has also the code to enable (or disable) the job.

Which parameters/settings are used?

That question is a bit more tricky as the Note says: “The automatic statistics-gathering job uses the default parameter values for the DBMS_STATS procedures”. But how do I display them?

The following script will display the parameters being used during the Automatic Statistics Gathering:

SET ECHO OFF
SET TERMOUT ON
SET SERVEROUTPUT ON
SET TIMING OFF
DECLARE
   v1  varchar2(100);
   v2  varchar2(100);
   v3  varchar2(100);
   v4  varchar2(100);
   v5  varchar2(100);
   v6  varchar2(100);
   v7  varchar2(100);
   v8  varchar2(100);
   v9  varchar2(100);
   v10 varchar2(100);
BEGIN
   dbms_output.put_line('Automatic Stats Gathering Job - Parameters');
   dbms_output.put_line('==========================================');
   v1 := dbms_stats.get_prefs('AUTOSTATS_TARGET');
   dbms_output.put_line(' AUTOSTATS_TARGET:  ' || v1);
   v2 := dbms_stats.get_prefs('CASCADE');
   dbms_output.put_line(' CASCADE:           ' || v2);
   v3 := dbms_stats.get_prefs('DEGREE');
   dbms_output.put_line(' DEGREE:            ' || v3);
   v4 := dbms_stats.get_prefs('ESTIMATE_PERCENT');
   dbms_output.put_line(' ESTIMATE_PERCENT:  ' || v4);
   v5 := dbms_stats.get_prefs('METHOD_OPT');
   dbms_output.put_line(' METHOD_OPT:        ' || v5);
   v6 := dbms_stats.get_prefs('NO_INVALIDATE');
   dbms_output.put_line(' NO_INVALIDATE:     ' || v6);
   v7 := dbms_stats.get_prefs('GRANULARITY');
   dbms_output.put_line(' GRANULARITY:       ' || v7);
   v8 := dbms_stats.get_prefs('PUBLISH');
   dbms_output.put_line(' PUBLISH:           ' || v8);
   v9 := dbms_stats.get_prefs('INCREMENTAL');
   dbms_output.put_line(' INCREMENTAL:       ' || v9);
   v10:= dbms_stats.get_prefs('STALE_PERCENT');
   dbms_output.put_line(' STALE_PERCENT:     ' || v10);
END;
/

The settings of the DBMS_STATS.GATHER_SCHEMA_STATS procedure are documented:
https://docs.oracle.com/database/121/ARPLS/d_stats.htm#ARPLS68577 

When you compare the two you’ll see that the settings/defaults are identical.

But what is the difference between these two?

Both activities use the same parameters. So the stats will look the sameIF they get created. The real difference between the Automatic Statistics Gathering job and a manual invocation of GATHER_SCHEMA_STATS is that the latter will refresh ALL statistics whereas the Automatic Statistics Gathering job will refresh only statistics on objects where statistics are missing or marked as STALE.

The same behavior appears when you compare the recommendation to gather dictionary statistics before the upgrade by using DBMS_STATS.GATHER_DICTIONARY_STATS versus a DBMS_STATS.GATHER_SCHMEA_STATS(‘SYS’)call. The latter will refresh all statistics whereas the first one will take less resources but refresh only STALE and missing statistics.

A simple example

This script is kept as simple as possible.

  • It creates a test user
  • It creates two tables within this user – tablespace USERS
  • It inserts and updates information in the two tables
  • It flushes the monitoring information (how many DMLs got run?) out
  • It gathers stats on only one table to verify that STALE is working as intended
  • It kicks off the automatic stats gathering job
  • It kicks off the schema stats gathering call
  • It compares results before/after in the stats history table
set timing on
set serverout on
set echo on
set termout on
column table_name Format a5
column owner      Format a6
column stale_stats Format a4
column last_analyzed Format a15
column sample_size format 9999999
drop user test1 cascade;
create user test1 identified by test1;
grant connect, resource, dba to test1;
alter user test1 default tablespace USERS;
create table TEST1.TAB1 as select * from dba_objects where rownum<50001;
exec dbms_stats.gather_table_stats('TEST1','TAB1');
create table TEST1.TAB2 as select * from dba_objects where rownum<50001;
exec dbms_stats.gather_table_stats('TEST1','TAB2');
insert into TEST1.TAB1 select * from dba_objects where rownum<50001;
commit;
insert into TEST1.TAB2 select * from dba_objects where rownum<50001;
commit;
insert into TEST1.TAB2 select * from dba_objects where rownum<50001;
commit;
update TEST1.TAB1 set object_id=object_id+0;
commit;
update TEST1.TAB2 set object_id=object_id+1;
commit;
exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
select table_name,owner,stale_stats,to_char(last_analyzed,'DD-MON HH24:MI:SS') LAST_ANALYZED,SAMPLE_SIZE from dba_tab_statistics where table_name in ('TAB1','TAB2');
exec DBMS_STATS.GATHER_TABLE_STATS('TEST1','TAB1');
select table_name,owner,stale_stats,to_char(last_analyzed,'DD-MON HH24:MI:SS') LAST_ANALYZED,SAMPLE_SIZE from dba_tab_statistics where table_name in ('TAB1','TAB2');
exec DBMS_AUTO_TASK_IMMEDIATE.GATHER_OPTIMIZER_STATS;
pause Wait a bit - then press return ...
select table_name,owner,stale_stats,to_char(last_analyzed,'DD-MON HH24:MI:SS') LAST_ANALYZED,SAMPLE_SIZE from dba_tab_statistics where table_name in ('TAB1','TAB2');
exec dbms_stats.gather_schema_stats('TEST1');
select table_name,owner,stale_stats,to_char(last_analyzed,'DD-MON HH24:MI:SS') LAST_ANALYZED,SAMPLE_SIZE from dba_tab_statistics where table_name in ('TAB1','TAB2');
prompt End ...

The results

exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

TABLE OWNER  STAL LAST_ANALYZED
SAMPLE_SIZE
----- ------ ---- --------------- -----------
TAB1  TEST1  YES  29-FEB 22:37:07       50000
TAB2  TEST1  YES  29-FEB 22:37:07       50000

exec DBMS_STATS.GATHER_TABLE_STATS('TEST1','TAB1');

TABLE OWNER  STAL LAST_ANALYZED   SAMPLE_SIZE
----- ------ ---- --------------- -----------
TAB1  TEST1  NO   29-FEB 22:37:12 100000
TAB2  TEST1  YES  29-FEB 22:37:07  50000

exec DBMS_AUTO_TASK_IMMEDIATE.GATHER_OPTIMIZER_STATS;

TABLE OWNER  STAL LAST_ANALYZED   SAMPLE_SIZE
----- ------ ---- --------------- -----------
TAB1  TEST1  NO   29-FEB 22:37:12 100000
TAB2  TEST1  NO   29-FEB 22:37:13 150000

exec dbms_stats.gather_schema_stats('TEST1');

TABLE OWNER  STAL LAST_ANALYZED   SAMPLE_SIZE
----- ------ ---- --------------- -----------
TAB1  TEST1  NO   29-FEB 22:37:43 100000
TAB2  TEST1  NO   29-FEB 22:37:43 150000

The results can be interpreted this way:

  • The sample size of 50k is based on the first activity during the CTAS
  • Once table TAB1 gets analyzed the sample size is now correct – and the time stamp got updated – statistics on TAB2 are still marked STALE of course as the underlying table has changed by more than 10%
  • The Automatic Statistics Gathering job will refresh only stats for objects where stats are missing or marked STALE – in this example here TAB2. Table TAB1‘s statistics remain unchanged.
  • When the GATHER_SCHEMA_STATS job gets invoked it will refresh all statistics – regardless if they were STALE or not.

This is the behavior the customer who raised the question about differences in these two ways to create statistics may have seen. The GATHER_SCHEMA_STATS job took longer and consumed more resources as it will refresh all statistics regardless of the STALE attribute.

And it’s hard to figure out why the refresh of statistics created in a previous release may have led to suboptimal performance, especially as we talk about a patch set upgrade – and not a full release upgrade. Thanks to Wissem El Khlifi who twittered the following annotations I forgot to mention:

  • The Automatic Statistics Gathering job prioritizes objects with NO statistics over objects with STALE statistics
  • The Automatic Statistics Gathering job may get interrupted or skip objects leaving them with NO statistics gathered. You can force this by locking statistics – so the Auto job will skip those completely

You’ll find more information about the Automatic Statistics Gathering job here:

And another strange finding …

When I played with this example in 12c I encountered the strange behavior of the GATHER_OPTIMIZER_STATS call taking exactly 10 minutes unti it returns to the command prompt.

First I thought this is a Multitenant only issue. But I realized quickly: this happens in non-CDB databases in Oracle 12c as well. And when searching the bug database I came across the following unpublished bug:

  • Bug 14840737
    DBMS_AUTO_TASK_IMMEDIATE.GATHER_OPTIMIZER_STATS RETURNS INCORRECTLY

which got logged in Oct 2012 and describes this exact behavior. I kick off the job – it will update the stats pretty soon after – but still take 10 minutes to return control to the command prompt. It is supposed to be fixed in a future release of Oracle Database …

 

–Mike

8 thoughts on “Differences between Automatic Statistics Gathering job and GATHER_SCHEMA_STATS

  1. whereas the Automatic Statistics Gathering job will refresh only statistics on objects where statistics are marked as STALE.

    Instead it should be like this

    whereas the Automatic Statistics Gathering job will refresh only statistics on objects where statistics are marked as STALE or MISSING.

    correct me if i am wrong.

  2. > GATHER_SCHEMA_STATS job gets invoked it will refresh all statistics – regardless if they were STALE or not.

    This behavior is controlled by parameter options:
    GATHER: (Default) Gathers statistics on all objects in the schema.
    With "option => GATHER AUTO" it should be the same as Automatic Statistics Gathering job.

    More convenient way to list parameters:
    select
    dbms_stats.get_prefs(‘AUTOSTATS_TARGET’ ) AUTOSTATS_TARGET,
    dbms_stats.get_prefs(‘CASCADE’ ) CASCADE,
    dbms_stats.get_prefs(‘DEGREE’ ) DEGREE,
    dbms_stats.get_prefs(‘ESTIMATE_PERCENT’ ) ESTIMATE_PERCENT,
    dbms_stats.get_prefs(‘METHOD_OPT’ ) METHOD_OPT,
    dbms_stats.get_prefs(‘NO_INVALIDATE’ ) NO_INVALIDATE,
    dbms_stats.get_prefs(‘GRANULARITY’ ) GRANULARITY,
    dbms_stats.get_prefs(‘PUBLISH’ ) PUBLISH,
    dbms_stats.get_prefs(‘INCREMENTAL’ ) INCREMENTAL,
    dbms_stats.get_prefs(‘STALE_PERCENT’ ) STALE_PERCENT
    from DUAL;

    All altogether:
    – There are real parameters used for the new GATHER_SCHEMA_STATS job included, so we can just guess what caused the difference.
    – Differences between Automatic Statistics Gathering job and GATHER_SCHEMA_STATS are still unclear
    – And the main question: WHY automatic stats job was not able to provide optimal performance? The Automatic job could be interrupted once, but it surely took several days until they decided to create their own job, so this hardly is an explanation.

  3. Did I say something different?
    And actually I prefer a well-formatted output 😉 But thanks for the hint with the query.

    The main question got answered actually:
    "The real difference between the Automatic Statistics Gathering job and a manual invocation of GATHER_SCHEMA_STATS is that the latter will refresh ALL statistics whereas the Automatic Statistics Gathering job will refresh only statistics on objects where statistics are missing or marked as STALE."
    Try it.

    And that turned out to be the reason for suboptimal performance.

    Cheers
    Mike

  4. Sorry for being late here.

    Took this question to Asktom and learnt a couple of interesting things.

    1) It requires SYSDBA privilege to run this procedure
    exec dbms_auto_task_immediate.gather_optimizer_stats;
    without that no changes to stats.

    2) dbms_auto_task_immediate is not documented in "PL/SQL Packages and Types Reference" – Informed documentation team regarding this.

    https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9527525900346371992

  5. Thank you Mike – very useful information .

    Most of the DBAs may not aware of that the Automatic Statistics Gathering job will refresh only statistics on objects where statistics are marked as STALE or MISSING.

    And Gather Schema stats will collect / refresh all objects statistics .

    Regards

    Murari.

  6. Hi Mike,

    Why is

    exec DBMS_AUTO_TASK_IMMEDIATE.GATHER_OPTIMIZER_STATS

    followed by:

    pause Wait a bit – then press return …

    i.e. why doesn’t the procedure finish ?

    Thanks
    Andrew

  7. @Andrew:
    The data will have to be written to the dictionary tables. I just need to make sure that I see the refreshed data and not the data from before my action.

    Cheers
    Mike

Leave a Reply

Your email address will not be published. Required fields are marked *