Indexing for GROUP BY

it’s not glamorous

And on your list of things that aren’t going fast enough, it’s probably pretty low. But you can get some pretty dramatic gains from indexes that cover columns you’re performing aggregations on.

We’ll take a quick walk down demo lane in a moment, using the Stack Overflow database.

query outta nowhere!

SET NOCOUNT ON

SET STATISTICS TIME, IO ON 

SELECT [v].[UserId], [v].[BountyAmount], SUM([v].[BountyAmount]) AS [BountyTotal]
FROM [dbo].[Votes] AS [v]
WHERE [v].[BountyAmount] IS NOT NULL
GROUP BY [v].[UserId], [v].[BountyAmount]

Looking at the plan, it’s pretty easy to see what happened. Since the data is not ordered by an index (the clustered index on this table is on an Id column not referenced here), a Hash Match Aggregate was chosen, and off we went.

Look how much fun we’re having.

Zooming in a bit on the Hash Match, this is what it’s doing. It should look pretty familiar to you if you’ve ever seen a Hash Match used to JOIN columns. The only difference here is that the Hash table is built, scanned, and output. When used in a JOIN, a Probe is also built to match the Residual buckets, and then the results are output.

It’s basically wiping its hands on its pants.

It took quite a bit of activity to do a pretty simple thing.

/* Table 'Votes'. Scan count 5, logical reads 315406, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 3609 ms, elapsed time = 1136 ms.
*/

Since this query is simple, our index is simple.

CREATE NONCLUSTERED INDEX [IX_GRPBY] ON dbo.[Votes]
(
[BountyAmount], [UserId]
)

I’m using the BountyAmount column in the first position because we’re also filtering on it in the query. We don’t really care about the SUM of all NULLs.

Taking that new index out for a spin, what do we end up with?

Stream Theater

The Hash Match Aggregate has been replaced with a Stream Aggregate, and the Scan of the Clustered Index has been replaced with a Seek of the Non-Clustered Index. This all took significantly less work:

/* Table 'Votes'. Scan count 1, logical reads 335, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 278 ms.
*/

Zooming in on the Stream Aggregate operator, because we gave the Hash Match so much attention. Good behavior should be rewarded.

You make it look so easy, Stream Aggregate.

Filters, filters, filters

If we want to take it a step further, we can filter the index to avoid the NULLs all together.

CREATE NONCLUSTERED INDEX [IX_GRPBY] ON dbo.[Votes]
(
[BountyAmount], [UserId]
) WHERE [BountyAmount] IS NOT NULL
WITH (DROP_EXISTING = ON)

This results in very slightly reduced CPU and IO. The real advantage of filtering the index here is that it takes up nearly 2 GB less space than without the filter. Collect two drinks from your SAN admin.

/* Table 'Votes'. Scan count 1, logical reads 333, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 233 ms.
*/

And, because I knew you’d ask, I did try making the same index with the column order reversed. It was not more efficient, because it ended up doing a Scan of the Non-Clustered Index instead, which results in a bit more CPU time.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

Indexing for GROUP BY

it’s not glamorous

query outta nowhere!

It took quite a bit of activity to do a pretty simple thing.

Taking that new index out for a spin, what do we end up with?

Filters, filters, filters

Trending Articles

Black Angus Grilled Artichokes

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Noqu i Tau

The Angry Birds Movie (Tamil Dubbed)

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Practice Sheet of Right form of verbs for HSC Students

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

FINAL LESSON

East Hull MD admits sexual assaults after another victim comes forward

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

Who’s been sentenced at Northampton Magistrates’ Court

Man arrested after fracas in flat

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

The Smashing Pumpkins – Siamese Dream (1993/2011) {Remastered 2CD Deluxe...

City Hunter Teledrama – Episode 18 – 07th May 2016

Playboi Carti – MUSIC – SORRY 4 DA WAIT [iTunes Plus M4A + M4V]

Download: Ziba Zako ft Rich Bizzy & General Kanene – Chikwati (Prod by: Bicko...

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...