Thursday 6 September 2018

Interpreting the query plan and speeding it up

  • Bookmark Lookup - An index was used but as the index does not 'cover' the query the query processor must use the ROWID from the index to 'lookup' the actual data row. Unless the index can be alter to ‘cover’ the columns of the query this cannot be optimised further.
  • Clustered Index Scan - Similar to a table scan (its just that the table has a clustered index) and similarly undesirable, particularly for large tables. If you see these then the columns in the query are either not indexed or the distribution statistics have led the query optimiser to decide the scan is more/ as efficient. Restructuring the query can sometimes eliminate these operations; or add an index.
  • Clustered Index Seek - The clustered index is used to find matching rows. This is optimal behaviour.
  • Compute Scalar - As it says – a scalar is being computed. Unless this is not really needed it can't be optimised. It its not needed, remove it!
  • Constant Scan - The query requires a constant value in some (or all) rows
  • Hash Match - The query processor builds a hash table for each row being processed. As subsequent rows are processed, the hash is computed and compared to the hash table for matches. Queries with DISTINCT, UNION, or aggregates often require a hash table to remove duplicates. If such operations are required there is little you can do to optimise.
  • Index Scan - The non-clustered index is being used to locate a large number of data rows which must then be scanned. Unless you can restructure the query to return fewer rows this cannot be optimised.
  • Index Seek - A non-clustered index is being used and only a small part of the index is required. You've chosen your indexes well as this is one of the most efficient operations.
  • Index Spool - As rows are scanned tempdb is used to store a 'spool' table that can be used rather than re-reading input rows. This is an internal optimization for complex queries and cannot be changed.
  • Merge Join - Occurs when the two inputs contain sorted data and the query processor can merge them together, or when two or more indexes are used to query a table. Very efficient but to occur the joins must have access to sorted data – achieved by indexing on join or ORDER BY columns.
  • Nested Loop - In this join, one table is chosen as the inner and scanned for each row of the outer. This is only efficient for small numbers of rows. Restructuring the query can remove these joins. Note that if available memory is low these are more likely to occur.
  • Remote Query - As it says – a query occurring on a remote data source so you either need to optimise on the remote data source or move the necessary data to your local SQLServer.
  • Sort - Is expensive but tends also to be necessary.
  • Table Scan - Is performed when the table has no clustered index (its a heap). Probably the least desirable operation to see chosen as the query processor will read each row into memory in order to decide whether it should be returned by the query. For tables of more than a few hundred rows you should add an appropriate index. Some of the gains in performance come from writing well structured, efficient queries that return the minimum amount of information needed, but most gains in performance are made by choosing good indexes. Hence the basic recommendation is to ensure that there are indexes on all tables and that the statistics for those indexes are up to date. Not coincidentally, these are the two main factors that influence the query optimiser in making decisions about the execution plan. At this stage I would recommend you run a few multi-table queries from the Northwind database, or your own, in Query Analyzer and assess whether the execution plans generated are optimal based on the above information.
  • Optimising queries - As just stated, indexes and rewriting queries are the most common options for optimisation but you can also use query hints though care should be taken to ensure you’re not forcing the server to choose a sub-optimal execution plan. Generally query hints are not recommended – SQLServer knows best! Considering indexes: in order for SQLServer to make use of an index, the first indexed column must be included in the WHERE clause as part of a qualification, or it will not be considered. However, all columns in a WHERE clause do not need to be included in the index for it to be chosen. There can only be one clustered index per table and, as a result, it should be chosen carefully. By default the primary key is the clustered index but often this is not the best choice unless it is the only index that will exist on the table. Clustered indexes are best used for range queries, e.g. dates or numerical ranges. Non-clustered indexes work best on large tables when very few rows are being returned. The query optimiser will often choose a non-clustered index that is highly selective when the index columns are included in the join statements. When this occurs SQLServer can find the row needed in the index very quickly and get to the actual data quickly as well. If not selective the process is much less efficient and the index may not be chosen. An exception is if the non-clustered index is a covering index. Foreign keys are usually good choices for non-clustered indexes as usually these columns are used in joins. Regardless of the type of index there are a few general guidelines for creating indexes:
    1. Index those columns often used by joins
    2. Be sure that the first column of the index is the column specified in most joins.
    3. Analyze your queries and ensure your indexes are being used; if not they are an unnecessary overhead and/ or indexes could be better placed
    4. CONSIDER FILEGROUPS: placing non clustered indexes on a different filegroup on a distinct physical device than the data itself can lead to performance gains

How to compare one query against another to see which is best

Highlight both queries, and press CTRL-L to view the query plans of both.

The two percentages will total 100%. In this case, the first query at 98% sucks, and the 2% one wins by a mile. The lower % the cost, the better.

 

 What is the true cost of a query?

Click on the FIRST box of the query plan, and hover your mouse over it. A pop up window will appear. The cost of the query is the "Estimated Subtree Cost".

If Estimated Subtree Cost is < 20, its OK. If it's > 20, you really need to change it, or add in an index, etc.

 

 My query is slow, how to easily speed it up

I used to get this a lot. And it either boils down to an "OR" clause, or a non-sargable query.

 

Get rid of 'OR' clauses

If you query contains an OR clause in the query, Immediately split the query into two queries, one query using the left side of the or clause, and the other the right side. Then join the results with a UNION ALL. Don't use UNION as that is slow due to it having to remove duplicates between the two result sets.

Example:

Slow query
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FP.forename = 'Simon'
        OR FC.name LIKE 'acme%';

Faster query
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FP.forename = 'Simon'
UNION ALL
SELECT  *
FROM    FieldPerson AS FP
        INNER JOIN FieldCompany AS FC
            ON FC.id = FP.field_company_id
WHERE   FC.name LIKE 'acme%';

Even a simple case as those two above the query plan comparison is 55% vs 45%.

 

Non-Sargable queries

The most common thing that will make a query non-sargable is to include a field inside a function in the where clause:

SELECT ... FROM ... WHERE Year(myDate) = 2008

The SQL optimizer can't use an index on myDate, even if one exists. It will literally have to evaluate this function for every row of the table. Much better to use:
 
WHERE myDate >= '01-01-2008' AND myDate < '01-01-2009'

Some other examples:
 
Bad:  Select ... WHERE ISNULL(FullName,'Simon Hughes') = 'Simon Hughes'
Fixed: Select ... WHERE ((FullName = 'Simon Hughes') OR (FullName IS NULL))

Bad:  Select ... WHERE SUBSTRING(Forename, 3) = 'Sim'
Fixed: Select ... WHERE Forename Like 'Sim%'

Bad:  Select ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30
Fixed: Select ... WHERE OrderDate < DateAdd(mm, -30, GetDate())