Hotels on Wheels
Parking garages and predicates have a “lot” in common.
GET IT?
No no but seriously. They do.
If you’ve ever lost your ticket, forgotten where you’ve parked, or just driven around hopelessly waiting to see an empty spot, you probably wish there were a better way to do things.
CREATE TABLE dbo.PahkinLaht ( FloorId INT NOT NULL, LaneId INT NOT NULL, SpaceId INT NOT NULL, CarId INT NOT NULL, IsEmpty BIT NOT NULL );
Maybe even a data-driven way of doing things.
Informal poll: Is anyone reading this blog data-driven?
If You Had To Index That…
Say you wanted to quickly find your car. An index on CarId would be awesome, if you know what it is.
If you got a parking slip with your Space Number on it, that’s sort of helpful. But then you need to figure out which floor and lane you’re in.
On the other end, if you need an empty space, an index on IsEmpty sounds pretty optimal.
Let’s say you’re Seinfelding
I don’t mean that, I mean this.
If Jerry, et al. knew where their car was, they’d have been able to walk directly to it. That’s like a Seek predicate.
If they knew which floor and lane they were on, they’d have been able to start Seeking in the right place, and then double check each car. That’s like a Seek predicate with a Residual predicate.
Likewise, if they know the lane and spot number, they could have done this:
Of course, they didn’t know anything, so they had to scan the entire joint. It looked something like this:
I don’t have the patience to paste that many thinking faces in.
There’s much to consider here. But I don’t even have a license, so let’s look at things in a Real Table®
Seek and Not Seek Walk Into A Bar
Using the miniaturized version of the StackOverflow database, I’m gonna focus in on the Users table.
The only thing on it is a Primary Key/Clustered index on the Id column. A query like this can use that quite easily.
SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WHERE u.Id = 1
This query only requires three reads to locate the page it needs and return a single row.
If we’re talking parking lots, a seek would look something like this.
The PK/CX is considerably less efficient for this query, taking nearly 7400 logical reads to return the same single row.
SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WHERE u.Reputation = 42510
It’s Not Seek vs Scan Though
What I’m interested in is the Seek Predicate vs Predicate. For instance, this query returns the same results, with the same 3 logical reads as the first query.
SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WHERE u.Id < 2 AND u.Reputation > 42509
- A predicate by itself is like having to walk through the entire parking garage trying to find your car.
- A seek predicate by itself is like being able to start your search where your car is.
- A seek predicate with a residual predicate is like walking along along the right lane and having to check parking space numbers.
What Makes For A Seek Predicate vs a Residual Predicate?
Mostly placement in the index, but SARGability plays an important role as well.
Let’s create an index and look at a couple slightly different queries.
CREATE INDEX ix_helper ON dbo.Users(Reputation, Id) INCLUDE (DisplayName); SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WITH (INDEX = PK_Users_Id) WHERE u.Id = 1 AND u.Reputation = 42510 SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WITH (INDEX = ix_helper) WHERE u.Id = 1 AND u.Reputation = 42510
This index leads with reputation, and also has Id as a key column. DisplayName is an included column, which puts it in just about the same place as Reputation was when we were only working with the clustered index.
The first query plan looks like this, seeking to the right key, and then evaluating the predicate on Reputation. This one is using the clustered index.
The second query uses the nonclustered index, but supports multiple seek keys.
When I mentioned that SARGability plays a part, that’s because if you do something this stupid, neither predicate will be seekable.
SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WHERE ABS(u.Id) = 1 AND ABS(u.Reputation) = 42510
Bumming Around
While I have your attention, and before I run DropIndexes, let’s look at how included columns work a little bit.
Right now, DisplayName is included in our nonclustered index. All three of these queries can use our index.
SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WHERE u.DisplayName LIKE 'Jeff Atwood%' SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WHERE u.DisplayName LIKE '%Jeff Atwood%' SELECT Id, u.DisplayName, u.Reputation FROM dbo.Users AS u WHERE u.DisplayName LIKE '%Jeff Atwood'
All of them will do the same number of logical reads (1279).
All three of them will have our LIKE condition as a predicate — we can’t seek here because it’s not in the key of the nonclustered index.
But only the first query, where the string search doesn’t lead with a wildcard will register a missing index.
In other words, a lack of SARGability with leading wild card searches shut down missing index requests.
If we add the missing index request, the reason becomes pretty apparent.
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[Users] ([DisplayName]) INCLUDE ([Reputation])
The index really does only help the first query. The other two queries do the same number of logical reads as before. Adding this index doesn’t change much for them.
Beat It
This is a small step towards understanding seek and residual predicates, and how indexes support searches.
The thing to keep in mind is that you can really only seek into an index when a SARGable predicate hits the leading key column. Multi-key indexes can support multiple seek keys.
If you’re only searching secondary (or beyond) columns, you’re only able to scan with a regular predicate. The same goes for included columns.
Thanks for reading!
Brent's presenting at the upcoming GroupBy: Getting Better Query Plans. Register free.