Buskets
Much like Sorts, Hash Joins require some amount of memory to operate efficiently — without spilling, or spilling too much.
And to a similar degree, the number of rows and columns passed to the Hashing operator matter where the memory grant is concerned. This doesn’t mean Hashing is bad, but you may need to take some extra steps when tuning queries that use them.
The reasons are pretty obvious when you think about the context of a Hash operation, whether it’s a join or aggregation.
- All rows from the build side have to arrive at the operator (in parallel plans, usually after a bitmap filter)
- The hashing function gets applied to join or grouping columns
- In a join, the hashed values from the build side probe hashed values from the outer side
- In some cases, the actual values need to be checked as a residual
During all that nonsense, all the columns that you SELECT get dragged along for the ride.
Here’s a quick example!
This query doesn’t return any rows, because Jon Skeet hadn’t hit 1 million rep in the data dump I’m using (Stack Overflow 2010).
/*Returns nothing*/ SELECT u.DisplayName FROM dbo.Users AS u JOIN dbo.Posts AS p ON u.Id = p.OwnerUserId WHERE u.Reputation >= 1000000;
Despite that, the memory asks for about 7 MB of memory to run. This seems to be the lowest memory grant I could get the optimizer to ask for
If we drop the Reputation filter down a bit so some rows get returned, the memory grant stays the same.
SELECT u.DisplayName FROM dbo.Users AS u JOIN dbo.Posts AS p ON u.Id = p.OwnerUserId WHERE u.Reputation >= 500000;
That’s why I’m calling 7MB the “base” grant here — that, and if I drop the Reputation filter lower to allow more people in, the grant will go up.
SELECT u.DisplayName FROM dbo.Users AS u JOIN dbo.Posts AS p ON u.Id = p.OwnerUserId WHERE u.Reputation >= 400000;
But we can also get a grant higher than the base by requesting more columns.
SELECT u.DisplayName, p.Title FROM dbo.Users AS u JOIN dbo.Posts AS p ON u.Id = p.OwnerUserId WHERE u.Reputation >= 500000;
This is more easily accomplished by selecting string data. Again, just like with Sorts, we don’t need to actually sort by string data for the memory grant to go up. We just need to make it pass through a memory consuming operator.
Thanks for reading!
Brent says: you remember how, in the beginning of your career, some old crusty DBA told you to avoid SELECT *? Turns out they were right.