Statistics is a histogram displaying the distribution of values
in the first column of statistics.
- The query optimizer uses statistics to create query plans that improve query performance
- Statistics for query optimization are objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view.
- The query optimizer uses these statistics to estimate the cardinality, or number of rows, in the query result
- These cardinality estimates enable the query optimizer to create a high-quality query plan. For example, the query optimizer could use cardinality estimates to choose the index seek operator instead of the more resource-intensive index scan operator.
- Each statistics object is created on a list of one or more table columns and includes a histogram displaying the distribution of values in the first column
- Statistics objects on multiple columns also store statistical information about the correlation of values among the columns
- These correlation statistics, or densities, are derived from the number of distinct rows of column values
- You need to create additional statistics or modify the query design for best results
Filtered Statistics:
9)
Filtered statistics can
improve query performance for queries that select from well-defined subsets of
data.
10)
Filtered statistics use a
filter predicate to select the subset of data that is included in the
statistics.
11)
Well-designed filtered
statistics can improve the query execution plan compared with full-table
statistics
Example:
CREATE STATISTICS ContactPromotion1
ON Person.Person (BusinessEntityID,
LastName, EmailPromotion)
WHERE
EmailPromotion = 2
WITH SAMPLE 50 PERCENT;
Three options that you can set that affect when
and how statistics are created and updated, these options are set at the
database level only.
·
AUTO_CREATE_STATISTICS Option
·
AUTO_UPDATE_STATISTICS Option
·
AUTO_UPDATE_STATISTICS_ASYNC
AUTO_CREATE_STATISTICS:
1)
When the automatic create statistics
option “ AUTO_CREATE_STATISTICS” is on, the query optimizer creates statistics
on individual columns in the query predicate, as necessary, to improve
cardinality estimates for the query plan.
2)
These single-column statistics are
created on columns that do not already have a histogram in an existing
statistics object.
3)
This option does not determine whether
statistics get created for indexes.
4)
This option also does not generate
filtered statistics.
5)
It applies strictly to single-column
statistics for the full table
Query predicate = where
condition
When the query optimizer creates statistics as a result
of using the AUTO_CREATE_STATISTICS option, the statistics name starts with “ _WA” and column “auto created” in “sys.stats” table is 1 You can use the following query to determine
if the query optimizer has created statistics for a query predicate column.
SELECT OBJECT_NAME(s.object_id) AS object_name,
COL_NAME(sc.object_id, sc.column_id) AS column_name,
s.name AS statistics_name
FROM sys.stats AS s JOIN sys.stats_columns AS sc
ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id
WHERE s.name
like '_WA%' and auto_created=1
ORDER BY s.name;
AUTO_UPDATE_STATISTICS:
1)
When the option “ AUTO_UPDATE_STATISTICS” is on, the
query optimizer determines when statistics might be out-of-date and then
updates them when they are used by a query.
2)
Statistics become out-of-date after insert, update, delete,
or merge operations change the data distribution in the table or indexed view.
3)
The query optimizer determines when statistics might be
out-of-date by counting the number of data modifications (DML) since the last
statistics update and comparing the number of modifications to a threshold. The
threshold is based on the number of rows in the table or indexed view.
4)
The query optimizer checks for out-of-date statistics
before compiling a query and before executing a cached query plan
5)
Before compiling a query, the query optimizer uses the
columns, tables, and indexed views in the query predicate to determine which
statistics might be out-of-date.
6)
Before executing a cached query plan, the Database Engine
verifies that the query plan references up-to-date statistics(Statistics should
be updated before query get executed).
7)
This option applies to statistics objects created for(
indexes, single-columns in query predicates{statistics created by auto create
statistics}, and statistics created with the CREATE STATISTICS statement and filtered
statistics)
AUTO_UPDATE_STATISTICS_ASYNC:
The asynchronous statistics update option, determines whether the query
optimizer uses synchronous or asynchronous statistics updates. By default, the
asynchronous statistics update option is off, and the query optimizer updates
statistics synchronously.
1)
This
option applies to statistics objects created for indexes, single columns in
query predicates, and statistics created with the CREATE STATISTICS
statement.
2)
With
asynchronous statistics updates, queries compile with existing statistics even
if the existing statistics are out-of-date.
3)
The
query optimizer could choose a suboptimal query plan if statistics are
out-of-date when the query compiles (query will run and later update statistics).
4)
Queries
that compile after the asynchronous updates have completed will benefit from using
the updated statistics.
5)
Benefit: Your application has experienced client request
time outs caused by one or more queries waiting for updated statistics. In some
cases, waiting for synchronous statistics (AUTO_UPDATE_STATISTICS) could cause
applications with aggressive time outs to fail.
Thanks to Tinnu Babu T Paily for drafting this post
References:
No comments:
Post a Comment