Monday, July 7, 2014

How to Gather Statistics

Whenever the data changes "significantly".
If a table goes from 1 row to 200 rows, that's a significant change. When a table goes from 100,000 rows to 150,000 rows, that's not a terribly significant change. When a table goes from 1000 rows all with identical values in commonly-queried column X to 1000 rows with nearly unique values in column X, that's a significant change.

CBO and Database Statistics

When a valid SQL statement is sent to the server for the first time, Oracle produces an execution plan that describes how to retrieve the necessary data. In older versions of the database this execution plan could be generated using one of two optimizers:

-Rule-Based Optimizer (RBO) - This was the original optimization method and as the name suggests, was essentially a list of rules Oracle should follow to generate an execution plan. Even after the cost-based optimizer was introduced, this method was used if the server had no internal statistics relating to the objects referenced by the statement, or if explicitly requested by a hint or instance/session parameter. This optimizer was made obsolete, then deprecated in later versions of the database.

-Cost-Based Optimizer (CBO) - The CBO uses database statistics to generate several execution plans, picking the one with the lowest cost, where cost relates to system resources required to complete the operation.

In newer versions of the database the cost-based optimizer is the only option available.

How do you collect statistics for a table, schema and Database?

There are two ways to gather statistics.

1)Using DBMS_STATS package.
2)Using ANALYZE command.

To Gathers statistics for all objects in a schema

SQL> select owner, table_name,last_analyzed from dba_tables where owner='SCOTT' order by last_analyzed;

SQL> select ocwner, index_name, last_analyzed from dba_indexes where owner='SCOTT' order by last_analyzed;

SQL> exec dbms_stats.gather_schema_stats(ownname => 'SCOTT', options => 'GATHER', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL COLUMNS SIZE AUTO', cascade => TRUE);

SQL> select owner, table_name,last_analyzed from dba_tables where owner='SCOTT' order by last_analyzed;

SQL> select owner, index_name, last_analyzed from dba_indexes where owner='SCOTT' order by last_analyzed;

Cascade: Determines if index stats should be gathered for the current table (true, false, auto_cascade).
Estimate_percent: Percentage of rows to sample when gathering stats (0.000001-100 or auto_sample_size).
Method_opt: Controls column statistics collection and histogram creation.
Options:
    a) Gather: re-analysis the whole schema
    b) Gather empty: only analyzes tables that have no existing statistics
    c) Gather stale: only reanalyzes tables with more than 10 percent 
      modifications (inserts,updates,deletes)
    d) Gather auto: will reanalyze objects that currently have no statistics and
      objects with stale statistics.Using Gather auto is like combining
      Gather stale and Gather empty.

21:48:19 SYS> select client_name, status from dba_autotask_operation;

CLIENT_NAME                                      STATUS
---------------------------------------------- ------------
auto optimizer stats collection                ENABLED
auto space advisor                             ENABLED
sql tuning advisor                             ENABLED

To Gather statistics for tables in a schema

SQL>exec dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname => 'EMP',estimate_percent=> 100,cascade => true);

Using Analyze:

Oracle strongly recommend not use ANALYZE command to estimate statistics. Yet it is supported for backward compatibility. To generate statistics of TEST table using ANALYZE use ANALYZE with estimate statistics keyword.,

In this example I verified that statistics of table column num_rows contain information after analyzing.

SQL> SELECT NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME=’TEST’;

NUM_ROWS
---------
2

SQL> INSERT INTO TEST VALUES(‘before’);

1 row created.

SQL> SELECT NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME=’TEST’;

NUM_ROWS
---------
2

SQL> ANALYZE TABLE TEST COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME=’TEST’;

NUM_ROWS
---------
3

After gather statistics num_rows contain accurate information.

The two most important questions to answer when it comes to Oracle Optimizer statistics are:
  1. What is the best method for collection of statistics?
  2. How often should I collect the statistics?
Unfortunately, there are no right or wrong answers to both the above questions as “it depends…” on your application and data.

References:


No comments:

Post a Comment