Quantcast
Channel: Erik Darling – Brent Ozar Unlimited®
Viewing all articles
Browse latest Browse all 370

Indexing for Windowing Functions

$
0
0

Hooray Windowing Functions

They do stuff that used to be hard to do, or took weird self-joins or correlated sub-queries with triangular joins to accomplish. That’s when there’s a standalone inequality predicate, usually for getting a running total.

With Windowing Functions, a lot of the code complexity and inefficiency is taken out of the picture, but they still work better if you feed them some useful indexes.

What kind of index works best?

In general, what’s been termed a POC Index by Itzik Ben-Gan and documented to some extent here.

POC stands for Partition, Order, Covering. When you look at your code, you want to first index any columns you’re partitioning on, then any columns you’re ordering by, and then cover (with an INCLUDE) any other columns you’re calling in the query.

Note that this is the optimal indexing strategy for Windowing Functions, and not necessarily for the query as a whole. Supporting other operations may lead you to design indexes differently, and that’s fine.

Everyone loves a demo

Here’s a quick example with a little extra something extra for the indexing witches and warlocks out there. I’m using the Stack Exchange database, which you can find out how to make your favorite new test database here.

SET NOCOUNT ON

SET STATISTICS IO, TIME ON  

SELECT  [p].[OwnerUserId] ,
        [p].[CreationDate] ,
        SUM([p].[ViewCount]) OVER ( PARTITION BY [p].[OwnerUserId] ORDER BY [p].[CreationDate] ) AS [TotalViews]
FROM    [dbo].[Posts] AS [p]
WHERE   [p].[PostTypeId] = 1
        AND [p].[Score] > 0
        AND [p].[OwnerUserId] = 4653
ORDER BY [p].[CreationDate]
OPTION  ( RECOMPILE );

/*
Table 'Posts'. Scan count 5, logical reads 488907, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1180, logical reads 7095, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2328 ms,  elapsed time = 760 ms.
*/

The query above is running on the Posts table which only has a Clustered Index on the Id column, that does us absolutely no good here. There are tons of access operations and logical reads. Taking a look at the plan doesn’t offer much:

I am a plan. Love me.

I am a plan. Love me.

Let’s try a POC index to fix this up. I’m keeping ViewCount in the key because we’re aggregating on it. You can sometimes get away with just using it as an INCLUDE column instead.

CREATE NONCLUSTERED INDEX [IX_POC_DEMO] ON [dbo].[Posts] 
(
[OwnerUserId], [CreationDate], [ViewCount]
) 

We can note with a tone of obvious and ominous foreshadowing that creating this index on the entire table takes about 15 seconds. Insert culturally appropriate scary sound effects here.

Here’s what the plan looks like running the query again:

I'm a Scorpio. I like Datsuns and Winston 100s.

I’m a Scorpio. I like Datsuns and Winston 100s.

That key lookup is annoying.

Not all key lookups are due to output columns. Some of them are predicates.

Not all key lookups are due to output columns. Some of them are predicates.

We did a good job of reducing a lot of the ickiness from before:

/*
Table 'Worktable'. Scan count 1180, logical reads 7095, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 4973, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 168 ms.
*/

But we’re not happy. Why? Because we’re DBAs. Or developers. Or we just have to use computers, which are the worst things ever invented.

Behold the filtered index

CREATE NONCLUSTERED INDEX [IX_POC_DEMO] ON [dbo].[Posts] 
(
[OwnerUserId], [CreationDate], [ViewCount]
) 
WHERE [PostTypeId] = 1 AND [Score] > 0
WITH (DROP_EXISTING = ON)

Cool. This index only takes about three seconds to create. Marinate on that.

This query is so important and predictable that we can roll this out for it. How does it look now?

Well, but, WHY?

Well, but, WHY?

That key lookup is still there, and now 100% of the estimated magickal query dust cost. For those keeping track at home, this is the entirely new missing index SQL Server thinks will fix your relationship with your dad:

/*
USE [StackOverflow]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Posts] ([OwnerUserId],[PostTypeId],[Score])
INCLUDE ([CreationDate],[ViewCount])
GO
*/

But we took a nice chunk out of the IO and knocked a little more off the CPU, again.

/*
Table 'Worktable'. Scan count 1180, logical reads 7102, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 91 ms.
*/

What can we do here?

Include!

CREATE NONCLUSTERED INDEX [IX_POC_DEMO] ON [dbo].[Posts] 
(
[OwnerUserId], [CreationDate], [ViewCount]
) INCLUDE ([PostTypeId], [Score])
WHERE [PostTypeId] = 1 AND [Score] > 0
WITH (DROP_EXISTING = ON)

Running the query one last time, we finally get rid of that stinky lookup:

Bully for you!

Bully for you!

And we’re still at the same place for IO:

/*
Table 'Worktable'. Scan count 1180, logical reads 7102, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 90 ms.
*/

What did we learn? Windowing functions are really powerful T-SQL tools, but you still need to be hip to indexing to get the most out of them.

Check out our free resources on Windowing Functions here.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

Jeff Moden talks at length about triangular joins here (registration required).


Viewing all articles
Browse latest Browse all 370