Erik Darling Joins Brent Ozar Unlimited

March 20, 2015, 8:52 am

≫ Next: Online Index Creation and Cruel Defaults

Kendra here– we’re really excited to announce that we’ve hired Erik Darling on to join our team as a consultant. Erik’s first day is today.

To introduce Erik, we’re sharing some of his answers to our call for job applications. We were so impressed, we knew we had to talk to him right away. Erik-Robot

What are your favorite things about your DBA job?

Getting a 10 production server/100tb of data environment configured to best practices and maintained and (WEEKLY) DBCC checked.

Migrating legacy apps to new VMs.

Helping the devs with in-house applications and reports (lots of code and index tuning).

Setting up Failover Clusters, install and config SQL.

Learning, learning, learning.

Being the only DBA.

What parts of the DBA job do you not enjoy?

Filling out paperwork, archiving to tape, being on call from 7am to midnight seven days a week, being the only DBA.

Tell us about a time you broke a production server.

So this one time I had to move a 5tb database to a near-line SATA LUN, and I was all like, “GUI? GU you! I’m gonna PoSH this ogre to icy hell.”

That’s when I found out that the Move-Item command basically eats like all the memory it can if you don’t have LPIM set up for SQL, and this dual 12, 512GB behemoth basically churned to a halt, couldn’t RDP, couldn’t VNC, nothing.

On the plus side, I could use PowerShell to kill the PowerShell process remotely.

After that I just restored a copy of the database to the new LUN and dropped the old one. That went a lot better.

Why would you be great at this job?

I like solving big problems. I like teaching people how to solve little problems. I once taught a developer who barely spoke English how to cross apply with a drawing. It may or may not have been a New Order album cover. I can talk about SQL all day. Most of it is right, too, especially if I think about it first. I am all self taught and have never had anyone senior to me, so I have plenty of straight up horribly grim determination to learn things and put them into practice.

What do you think will be the hardest part of this job for you?

Not sneaking onto Jeremiah’s Twitter account to change his name to Bodega Mayonnaise.

Brent says: I swear, we didn’t just hire him because of the tattoos. But I’m not going to say they weren’t a contributing factor.

Jeremiah says: I’m glad to have a co-worker who shares my appreciation of plaid, tattoos, and crazy T-SQL.

↧

Online Index Creation and Cruel Defaults

March 31, 2015, 6:00 am

≫ Next: CTEs, Inline Views, and What They Do

≪ Previous: Erik Darling Joins Brent Ozar Unlimited

Have you ever been watching queries crawl, feeling sort of helpless because you can’t change them? And even if you could, it’s hitting a new custom column in an ISV table that no one bothered to tell you about. I’ve been there. I’ve been pomade deep there.

I’ve also had my Robot tell me: “there’s an index for that, Erik.”

And Robot was right! But how could I sneak an index on this 180 million row table without making things worse? It’s the middle of a production day. Slow is bad; blocked is worse.

Well, creating an index ONLINE just needs a lil’ lock at the beginning and then a lil’ lock at the end, and just like the Ghost of Milton Berle that watches you while you sleep, you’ll never know it was there.

But you have to specify the ONLINE option, even in Enterprise Edition, that your company lovingly spent your bonuses for the next decade to license. Here is an example that breaks contrived speed:

AdventureWorks went out of business because of queries like this.

Seems reasonable. What say you, execution plan?

Hey, look, it’s that thing Robot told me about. Thanks, Robot.

Except unlike Robot, I have a feeling. Just one. And today that feeling cares about end user experience and satisfaction. I want to get this index into the party pronto. This is the index that SQL and Robot agree on. But c’mon, we both know there’s WAYYYY more stuff going on here.

/*
Missing Index Details from SQLQuery4.sql - NADA.AdventureWorks2014 (sa (59))
The Query Processor estimates that implementing the following index could improve the query cost by 99.0347%.
*/

/*
USE [AdventureWorks2014]
GO
CREATE NONCLUSTERED INDEX [IX_SOD_SODID]
ON [Sales].[SalesOrderDetail] ([SalesOrderDetailID])

GO
*/

To prove it, we can script out simple index out from the GUI.

And here’s what that gives us. Note that ONLINE = OFF here.

If we go back in and check out the options, we can turn ONLINE = ON here.

And now looking at how it scripts out, ONLINE = ON

Which means it’s totally safe to just roll this out any ol’ time.

Just kidding, follow change management procedures and TEST TEST TEST. Also remember that robots and missing index hints are sometimes just plain dumb. They’re not terribly aware of existing indexes, and they don’t consider the columns they’re asking you to index. I’ve seen SQL demand indexes on 4 – 5 (MAX) columns, or ask for just one column to be indexed that could easily be covered or included in another existing index.

If you want to see what’s up with your indexes, missing, existing, hypothetical, or somewhere in the middle of that Bizarre Love Triangle, click here to check out sp_BlitzIndex®.

Kendra says: I keep trying to use ONLINE=MOSTLY, because that seems accurate, but it never works.

Brent says: Once I got started typing out CREATE INDEX by hand, I fell into the rut of doing it every single time, and I forgot about all the cool options available in the GUI. I’m conditioned to just say ONLINE = ON, but there’s other cool options too – like 2014’s amazing performance improvements if you sort in tempdb.

↧

CTEs, Inline Views, and What They Do

April 7, 2015, 5:45 am

≫ Next: SELECT INTO and non-nullable columns

≪ Previous: Online Index Creation and Cruel Defaults

By now, you have probably heard of CTEs

And you may have even heard them referred to as Inline Views. Really, an Inline View can be any type of derived table. It’s very easy to illustrate when one may turn into a performance problem with CTEs, if you aren’t careful.

A lot of people think that when you call a CTE, the results are somehow persisted in a magical happy place and the underlying query just hangs back admiring the output as it sails into the upper deck.

Take the following example, which serves no real purpose.

WITH c1 AS (
SELECT [sp].[BusinessEntityID]
     , [sp].[TerritoryID]
     , [sp].[CommissionPct]
     , [sp].[SalesYTD]
FROM [Sales].[SalesPerson] AS [sp]
WHERE [sp].[TerritoryID] = 6
)
SELECT [c1].[BusinessEntityID]
, [c1].[TerritoryID]
, [c1].[CommissionPct]
, [c1].[SalesYTD]
FROM [c1] AS [c1]
WHERE [c1].[BusinessEntityID] = 278

It has a perfectly reasonable execution plan, and will lead a happy life.

Jimmy Execution Plan went to college, married his high school sweetheart, and had three red-headed Execution Plan kids.

Now, let’s pretend some additional request requires us to locate additional Business Entities in the same territory. No problemo.

WITH c1 AS (
SELECT [SP].[BusinessEntityID]
     , [SP].[TerritoryID]
     , [SP].[CommissionPct]
     , [SP].[SalesYTD]
FROM [Sales].[SalesPerson] AS [SP]
WHERE [SP].[TerritoryID] = 6
)
SELECT [sp].[BusinessEntityID]
     , [sp].[TerritoryID]
     , [sp].[CommissionPct]
     , [sp].[SalesYTD]
     , [c1].[BusinessEntityID]
     , [c1].[TerritoryID]
     , [c1].[CommissionPct]
     , [c1].[SalesYTD]
FROM [Sales].[SalesPerson] AS [sp]
JOIN [c1] AS [c1]
ON [c1].[TerritoryID] = [sp].[TerritoryID]
WHERE [sp].[BusinessEntityID] = 278
AND [c1].[BusinessEntityID] <> [sp].[BusinessEntityID]

Doesn’t get much easier than that. But what happened with the plan?

Happy Hour! Two for the price of shut up and drink!

Huh. That’s a whole other index operation. So just like when you join to a view, the view has to be executed and returned. In fact, if you keep throwing joins that reference the original CTE, you’ll keep getting more index operations.

WITH c1 AS (
SELECT [SP].[BusinessEntityID]
     , [SP].[TerritoryID]
     , [SP].[CommissionPct]
     , [SP].[SalesYTD]
FROM [Sales].[SalesPerson] AS [SP]
WHERE [SP].[TerritoryID] = 6
)
SELECT [sp].[BusinessEntityID]
     , [sp].[TerritoryID]
     , [sp].[CommissionPct]
     , [sp].[SalesYTD]
     , [c1].[BusinessEntityID]
     , [c1].[TerritoryID]
     , [c1].[CommissionPct]
     , [c1].[SalesYTD]
FROM [c1] AS [c1]
JOIN [Sales].[SalesPerson] AS [sp]
ON [c1].[TerritoryID] = [sp].[TerritoryID]
JOIN [c1] AS [c2]
ON [c2].[BusinessEntityID] = [c1].[BusinessEntityID]
WHERE [sp].[BusinessEntityID] = 278
AND [c1].[BusinessEntityID] <> [sp].[BusinessEntityID]

Did someone say they wanted another index operation? Because I thought I heard that.

Just because it’s a Seek doesn’t mean it’s okay.

And, just to drive the point home, let’s add another one:

WITH c1 AS (
SELECT [SP].[BusinessEntityID]
     , [SP].[TerritoryID]
     , [SP].[CommissionPct]
     , [SP].[SalesYTD]
FROM [Sales].[SalesPerson] AS [SP]
WHERE [SP].[TerritoryID] = 6
)
SELECT [sp].[BusinessEntityID]
     , [sp].[TerritoryID]
     , [sp].[CommissionPct]
     , [sp].[SalesYTD]
     , [c2].[BusinessEntityID]
     , [c2].[TerritoryID]
     , [c2].[CommissionPct]
     , [c2].[SalesYTD]
FROM [c1] AS [c1]
JOIN [Sales].[SalesPerson] AS [sp]
ON [c1].[TerritoryID] = [sp].[TerritoryID]
JOIN [c1] AS [c2]
ON [c1].[BusinessEntityID] = [c2].[BusinessEntityID]
JOIN [c1] AS [c3]
ON [c1].[BusinessEntityID] = [c3].[BusinessEntityID]
WHERE [sp].[BusinessEntityID] = 278
AND [c1].[BusinessEntityID] <> [sp].[BusinessEntityID]

Huzzah! Quadruplets!

If you see a pattern forming here, you’ll be great at quilting, which you should do instead of writing code like this.

To sum things up, CTEs are a great base

From which you can reference and filter on items in the select list that you otherwise wouldn’t be able to (think windowing functions), but every time you reference a CTE, they get executed. The fewer times you have to hit a larger base set, and the fewer reads you do, the better. If you find yourself referencing CTEs more than once or twice, you should consider a temp or persisted table instead, with the proper indexes.

Want to Up Your Query Writing Game?

Read more about CTEs in this blog post by Jeremiah
Learn How to Think Like SQL Server from Brent in this $29 course
Check out our 5 day in-person course with Jeremiah and Kendra on Advanced Querying and Indexing

Brent says: when I first start writing a query, I write for readability first. Make the query easy to understand – and CTEs can help big time here.

Kendra says: It drives some people nuts if you preface your CTE with a semi-colon to make sure the prior statement was properly terminated. But I won’t judge you if that’s what you like to do.

↧

SELECT INTO and non-nullable columns

April 14, 2015, 6:15 am

≫ Next: And Party and Alt Shift

≪ Previous: CTEs, Inline Views, and What They Do

SELECT…INTO is one of my favorite SQL Server features.

It’s great for creating table skeletons with false WHERE clauses (1=2), moving a real table to a staged/temp location for testing, etc.

In SQL Server 2014

It acquired the ability to go parallel, which is pretty neat, but that’s not what we’re talking about here.

It has some limitations

Chief among them is this:

Indexes, constraints, and triggers defined in the source table are not transferred to the new table, nor can they be specified in the SELECT…INTO statement. If these objects are required, you can create them after executing the SELECT…INTO statement.

Which is… Sort of true. There’s a trick, and I’ll show you a quick example here with another of my favorite things: a Numbers table.

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT [N].[N]
INTO #NumbersTest
FROM [Numbers] N

ALTER TABLE #NumbersTest ADD CONSTRAINT [PK_Numbers] 
PRIMARY KEY CLUSTERED (N) WITH (FILLFACTOR = 100)

Trying to add the PK constraint here fails, because the column is NULLable

Msg 8111, Level 16, State 1, Line 37 Cannot define PRIMARY KEY constraint on nullable column in table '#NumbersTest'. Msg 1750, Level 16, State 0, Line 37 Could not create constraint or index. See previous errors.

We can verify this by looking at the table metadata:

SELECT [columns].[name], [columns].[is_nullable]
FROM tempdb.sys.columns 
WHERE [object_id] = OBJECT_ID(N'tempdb..#NumbersTest');

name is_nullable N 1

So how do we fix this? We could alter the table, but that won’t leave us with the lasting satisfaction of proving BOL wrong on a technicality. We’ll adjust our code a bit, and try again.

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT 
ISNULL([N].[N], 0) AS N  /* <--- The magic is here! */
INTO #NumbersTest_IN
FROM [Numbers] N

ALTER TABLE #NumbersTest_IN ADD CONSTRAINT [PK_Numbers] 
PRIMARY KEY CLUSTERED (N) WITH (FILLFACTOR = 100)

This time, with the addition of an ISNULL check on the column, it ‘inherits’ the not NULLable property, and the PK constraint adds successfully. We can verify that by checking the table metadata, if you don’t believe me:

name is_nullable N 0

Note that this same behavior does not occur if you replace ISNULL() with COALESCE()

And with that, you can clean up the test tables. Or keep them around. Do some math. Whatever blows your hair back.

Got a favorite use for a numbers table outside of string splitting or doing date math? Let me know in the comments! I may write a follow up.

DROP TABLE [#NumbersTest]
DROP TABLE [#NumbersTest_IN]

Brent says: Wow. That is a really slick trick.

Kendra says: I know some people have bias against SELECT INTO because it seems sloppy and you have to go the extra mile to get the right types, but it can be a great tool. Don’t rule it out.

Jeremiah says: I really like this method – it’s a quick way to copy tables without using other cumbersome techniques

↧

And Party and Alt Shift

April 22, 2015, 5:15 am

≫ Next: Forcing Join Order Without Hints

≪ Previous: SELECT INTO and non-nullable columns

This is a cool SSMS trick I picked up a while back

Though not nearly as far back as I wish I had. It’s so cool I made a .gif of it in action. When you’re done putting your socks back on, I’ll tell you how it works.

This .gif was brought to you by the Cool SSMS Tricks Foundation, in association with Worldwide .gifs

Pure ALT+SHIFT magic.

Hold down both keys at the same time, and use your up and down arrow keys to navigate vertically. There will be a thin grey line showing you exactly which rows you’ve grabbed. Then Just type normally. I uh, simulated a typing error, to illustrate that you can also delete text doing this. Yeah.

It really makes doing simple multi-line edits a breeze, especially if you don’t feel like setting up Excel formulas to do similar tasks. These are random Massachusetts zip codes, which is why they get a leading zero, and quotes.

Can you feel the efficiency?!

Kendra says: What in the…. holy cow, that actually works!

Brent says: I knew about that trick, but ZOMG PEOPLE THERE IS A PRODUCTIVITY GIF IN OUR BLOG

↧

Forcing Join Order Without Hints

May 6, 2015, 6:45 am

≫ Next: Indexing for Windowing Functions

≪ Previous: And Party and Alt Shift

Brent buys lunch for the ladies

The purpose of this post is to show a bit of syntax that often gets overlooked in favor of using query hints to force joins to occur in a particular order. We’ll start by creating three tables. One for employees, one for orders, and one for items in the order.

/*
An employees table! How novel!
*/
CREATE TABLE #Ozars (OzarID INT IDENTITY(1,1) NOT NULL, OzarName VARCHAR(30) NOT NULL)
INSERT INTO #Ozars (OzarName) VALUES ('Brent'), ('Jeremiah'), ('Kendra'), ('Doug'), ('Jessica'), ('Erik')
ALTER TABLE #Ozars ADD CONSTRAINT [PK_Ozars] PRIMARY KEY CLUSTERED (OzarID, OzarName)

/*
Luuuuuuuunch
*/
CREATE TABLE #Lunch (LunchID INT IDENTITY(1,1) NOT NULL, OzarID INT NOT NULL)
INSERT INTO #Lunch (OzarID) VALUES (1),(1),(1),(3),(5)
ALTER TABLE #Lunch ADD CONSTRAINT [PK_Lunch] PRIMARY KEY CLUSTERED (LunchID, OzarID)

/*
Brent called it in, so it's all under his ID. Because that's how restaurants work. By ID. Yep.
*/
CREATE TABLE #LunchOrders (LunchOrderID INT IDENTITY(1,1) NOT NULL, LunchID INT NOT NULL, Lunch VARCHAR(20))
INSERT INTO #LunchOrders (LunchID, Lunch) VALUES (1, 'Just Churros'), (1, 'Box of Wine'), (1, 'Kaled Kale')
ALTER TABLE #LunchOrders ADD CONSTRAINT [PK_LunchOrders] PRIMARY KEY CLUSTERED (LunchOrderID, LunchID)

A SQL celebrity gossip blog got a tip that someone from BOU ordered take-out. Not exactly an earth-shattering event, but querying minds want to know!

So they write a query, and then they look at the plan.

SELECT o.*
, l.*
, lo.*
FROM #Ozars o
LEFT JOIN #Lunch l ON l.OzarID = o.OzarID
INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID

And that’s way harsh. SQL went and changed our LEFT JOIN into an INNER JOIN. What was it thinking? Now we don’t know who Brent is having lunch with.

How in the why in the heck did that LEFT JOIN turn into an INNER JOIN? SQL is full of it today.

Who ordered the KALE?

Okay, we thought about it some. No more INNER JOIN.
We’ll get this done with another LEFT JOIN.

SELECT o.*
,l.*
,lo.*
FROM #Ozars o
LEFT JOIN #Lunch l ON l.OzarID = o.OzarID
LEFT JOIN #LunchOrders lo ON lo.LunchID = l.LunchID

Unless you’re Ernie, that’s wayyyyy too many Brents.

Too many Brents on the dance floor.

First of all, ew. But yeah, you can do this, and it will come back with the right results.

SELECT o.*
, lol.*
FROM #Ozars o
LEFT JOIN (
       SELECT l.*, lo.LunchOrderID, lo.Lunch
       FROM #Lunch l 
       INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID
) lol
ON o.OzarID = lol.OzarID

WHAT SCALARS ARE YOU COMPUTING? WHAT FRESH HELL DID YOU SPAWN FROM?

We can even try an OUTER APPLY. That’s a little nicer looking as a query…

SELECT o.*
, lol.*
FROM #Ozars o
OUTER APPLY (
       SELECT l.LunchID ,
              l.OzarID, 
              lo.LunchOrderID, 
              lo.Lunch
       FROM #Lunch l 
       INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID
       WHERE o.OzarID = l.OzarID
) lol

… But same yucky plan.

Hi, I’m a cool trick.

SELECT o.*
, l.*
, lo.*
FROM #Ozars o
LEFT JOIN #Lunch l --They see me LEFT JOIN...
INNER JOIN #LunchOrders lo  --Then INNER JOIN...
       ON lo.LunchID = l.LunchID --Then write both my
       ON o.OzarID = l.OzarID --ON clauses

Nicer plan and a little less CPU than the others, on average.

Pleased to meetcha!

This is an interesting concept to play with. With more than a couple JOINs, you can start using parentheses to group them together, like so:

SELECT o.*
, l.*
, lo.*
FROM #Ozars o
LEFT JOIN (#Lunch l 
INNER JOIN #LunchOrders lo 
       ON lo.LunchID = l.LunchID)
       ON o.OzarID = l.OzarID

If you switch the order of the ON clauses in the second to last query, you’ll get an error. Take a guess why in the comments!

/*
Clean me up, buttercup.
*/
--DROP TABLE #Ozars;
--DROP TABLE #Lunch;
--DROP TABLE #LunchOrders;

Brent says: Okay, first off, I don’t normally drink an entire box of wine for lunch, but I had to wash down all that kale. Second off, judging by these execution plans, SQL Server intercepted my wine delivery.

Kendra says: I’ve found parentheses join hints twice in production code in SQL Server. In both cases, nobody knew why they were there, what would happen if they rewrote the query, or even if they’d been used on purpose or if it was just an accident. If you use this technique, document your code heavily as to what you’re doing and why, or you’ll be that person everyone grumbles about.

↧

Indexing for Windowing Functions

June 17, 2015, 6:15 am

≫ Next: Indexing for GROUP BY

≪ Previous: Forcing Join Order Without Hints

Hooray Windowing Functions

They do stuff that used to be hard to do, or took weird self-joins or correlated sub-queries with triangular joins to accomplish. That’s when there’s a standalone inequality predicate, usually for getting a running total.

With Windowing Functions, a lot of the code complexity and inefficiency is taken out of the picture, but they still work better if you feed them some useful indexes.

What kind of index works best?

In general, what’s been termed a POC Index by Itzik Ben-Gan and documented to some extent here.

POC stands for Partition, Order, Covering. When you look at your code, you want to first index any columns you’re partitioning on, then any columns you’re ordering by, and then cover (with an INCLUDE) any other columns you’re calling in the query.

Note that this is the optimal indexing strategy for Windowing Functions, and not necessarily for the query as a whole. Supporting other operations may lead you to design indexes differently, and that’s fine.

Everyone loves a demo

Here’s a quick example with a little extra something extra for the indexing witches and warlocks out there. I’m using the Stack Exchange database, which you can find out how to make your favorite new test database here.

SET NOCOUNT ON

SET STATISTICS IO, TIME ON  

SELECT  [p].[OwnerUserId] ,
        [p].[CreationDate] ,
        SUM([p].[ViewCount]) OVER ( PARTITION BY [p].[OwnerUserId] ORDER BY [p].[CreationDate] ) AS [TotalViews]
FROM    [dbo].[Posts] AS [p]
WHERE   [p].[PostTypeId] = 1
        AND [p].[Score] > 0
        AND [p].[OwnerUserId] = 4653
ORDER BY [p].[CreationDate]
OPTION  ( RECOMPILE );

/*
Table 'Posts'. Scan count 5, logical reads 488907, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1180, logical reads 7095, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2328 ms,  elapsed time = 760 ms.
*/

The query above is running on the Posts table which only has a Clustered Index on the Id column, that does us absolutely no good here. There are tons of access operations and logical reads. Taking a look at the plan doesn’t offer much:

I am a plan. Love me.

Let’s try a POC index to fix this up. I’m keeping ViewCount in the key because we’re aggregating on it. You can sometimes get away with just using it as an INCLUDE column instead.

CREATE NONCLUSTERED INDEX [IX_POC_DEMO] ON [dbo].[Posts] 
(
[OwnerUserId], [CreationDate], [ViewCount]
)

We can note with a tone of obvious and ominous foreshadowing that creating this index on the entire table takes about 15 seconds. Insert culturally appropriate scary sound effects here.

Here’s what the plan looks like running the query again:

I’m a Scorpio. I like Datsuns and Winston 100s.

That key lookup is annoying.

Not all key lookups are due to output columns. Some of them are predicates.

We did a good job of reducing a lot of the ickiness from before:

/*
Table 'Worktable'. Scan count 1180, logical reads 7095, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 4973, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 168 ms.
*/

But we’re not happy. Why? Because we’re DBAs. Or developers. Or we just have to use computers, which are the worst things ever invented.

Behold the filtered index

CREATE NONCLUSTERED INDEX [IX_POC_DEMO] ON [dbo].[Posts] 
(
[OwnerUserId], [CreationDate], [ViewCount]
) 
WHERE [PostTypeId] = 1 AND [Score] > 0
WITH (DROP_EXISTING = ON)

Cool. This index only takes about three seconds to create. Marinate on that.

This query is so important and predictable that we can roll this out for it. How does it look now?

Well, but, WHY?

That key lookup is still there, and now 100% of the estimated magickal query dust cost. For those keeping track at home, this is the entirely new missing index SQL Server thinks will fix your relationship with your dad:

/*
USE [StackOverflow]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Posts] ([OwnerUserId],[PostTypeId],[Score])
INCLUDE ([CreationDate],[ViewCount])
GO
*/

But we took a nice chunk out of the IO and knocked a little more off the CPU, again.

/*
Table 'Worktable'. Scan count 1180, logical reads 7102, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 91 ms.
*/

What can we do here?

Include!

CREATE NONCLUSTERED INDEX [IX_POC_DEMO] ON [dbo].[Posts] 
(
[OwnerUserId], [CreationDate], [ViewCount]
) INCLUDE ([PostTypeId], [Score])
WHERE [PostTypeId] = 1 AND [Score] > 0
WITH (DROP_EXISTING = ON)

Running the query one last time, we finally get rid of that stinky lookup:

Bully for you!

And we’re still at the same place for IO:

/*
Table 'Worktable'. Scan count 1180, logical reads 7102, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 90 ms.
*/

What did we learn? Windowing functions are really powerful T-SQL tools, but you still need to be hip to indexing to get the most out of them.

Check out our free resources on Windowing Functions here.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

Jeff Moden talks at length about triangular joins here (registration required).

↧

Indexing for GROUP BY

June 30, 2015, 6:15 am

≫ Next: The sp_rename follies

≪ Previous: Indexing for Windowing Functions

it’s not glamorous

And on your list of things that aren’t going fast enough, it’s probably pretty low. But you can get some pretty dramatic gains from indexes that cover columns you’re performing aggregations on.

We’ll take a quick walk down demo lane in a moment, using the Stack Overflow database.

query outta nowhere!

SET NOCOUNT ON

SET STATISTICS TIME, IO ON 

SELECT [v].[UserId], [v].[BountyAmount], SUM([v].[BountyAmount]) AS [BountyTotal]
FROM [dbo].[Votes] AS [v]
WHERE [v].[BountyAmount] IS NOT NULL
GROUP BY [v].[UserId], [v].[BountyAmount]

Looking at the plan, it’s pretty easy to see what happened. Since the data is not ordered by an index (the clustered index on this table is on an Id column not referenced here), a Hash Match Aggregate was chosen, and off we went.

Look how much fun we’re having.

Zooming in a bit on the Hash Match, this is what it’s doing. It should look pretty familiar to you if you’ve ever seen a Hash Match used to JOIN columns. The only difference here is that the Hash table is built, scanned, and output. When used in a JOIN, a Probe is also built to match the Residual buckets, and then the results are output.

It’s basically wiping its hands on its pants.

It took quite a bit of activity to do a pretty simple thing.

/* Table 'Votes'. Scan count 5, logical reads 315406, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 3609 ms, elapsed time = 1136 ms.
*/

Since this query is simple, our index is simple.

CREATE NONCLUSTERED INDEX [IX_GRPBY] ON dbo.[Votes]
(
[BountyAmount], [UserId]
)

I’m using the BountyAmount column in the first position because we’re also filtering on it in the query. We don’t really care about the SUM of all NULLs.

Taking that new index out for a spin, what do we end up with?

Stream Theater

The Hash Match Aggregate has been replaced with a Stream Aggregate, and the Scan of the Clustered Index has been replaced with a Seek of the Non-Clustered Index. This all took significantly less work:

/* Table 'Votes'. Scan count 1, logical reads 335, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 278 ms.
*/

Zooming in on the Stream Aggregate operator, because we gave the Hash Match so much attention. Good behavior should be rewarded.

You make it look so easy, Stream Aggregate.

Filters, filters, filters

If we want to take it a step further, we can filter the index to avoid the NULLs all together.

CREATE NONCLUSTERED INDEX [IX_GRPBY] ON dbo.[Votes]
(
[BountyAmount], [UserId]
) WHERE [BountyAmount] IS NOT NULL
WITH (DROP_EXISTING = ON)

This results in very slightly reduced CPU and IO. The real advantage of filtering the index here is that it takes up nearly 2 GB less space than without the filter. Collect two drinks from your SAN admin.

/* Table 'Votes'. Scan count 1, logical reads 333, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 233 ms.
*/

And, because I knew you’d ask, I did try making the same index with the column order reversed. It was not more efficient, because it ended up doing a Scan of the Non-Clustered Index instead, which results in a bit more CPU time.

If you like this sort of thing, you might be interested in our Advanced Querying & Indexing class this August in Portland, OR.

↧

The sp_rename follies

July 8, 2015, 6:15 am

≫ Next: Logical Query Processing

≪ Previous: Indexing for GROUP BY

Before we get started…

I know. I know. BOL. It’s documented.

They even show you how to rename a table.

Thanks, sp_rename!

But sometimes…

You just forget.

And as with most simple mistakes, fixing them is… Weird.

Here’s what happened to me recently, when I was working on a table swapping demo.

CREATE TABLE dbo.Whatever (i int)

SELECT *
FROM dbo.[Whatever] AS [w]

Here’s where I was going to rename it, and then make another table with the INT column as a BIGINT.

EXEC [sys].[sp_rename]
    @objname = N'dbo.Whatever' , 
    @newname = N'dbo.Whatever_New'

Which worked, except…

Oh, that’s just great.

dbo.dbo.

Like most people who make mistakes, I decided to revisit the documentation afterwards. And, yeah, you don’t specify schema in the new object name.

So, now that you all know you’re smarter than me, how would you fix it?

I’ll spare you the trial and error:

EXEC [sys].[sp_rename]
    @objname = N'dbo.[dbo.Whatever_New]' , 
    @newname = N'Whatever_New'

There were quite a few different arrangements of brackets and schema prefixes leading up to this.

I hope this post saves someone a little time.

Brent says: dear reader, please use this as part of an April Fool’s prank. Just not in production.

↧

Logical Query Processing

July 14, 2015, 6:45 am

≫ Next: Finding Tables with Nonclustered Primary Keys and no Clustered Index

≪ Previous: The sp_rename follies

you can’t do that on management studio

Recently, while working with a client, I did something in a query that they were mystified by. I didn’t think much of it, but I thought it might be useful to you, dear readers, as well. Along with an explanation.

Here’s a sample query that takes advantage of the same type of trick, but with a few extra bats and worms added in to illustrate a larger point.

SELECT TOP 100
        [b].[Name] ,
        [b].[UserId] ,
        COUNT_BIG(*) AS [Coconuts]
FROM    [dbo].[Badges] AS [b]
WHERE [b].[Date] >= '2014-01-01 00:00:00.003'
GROUP BY [b].[Name] ,
        [b].[UserId]
HAVING  COUNT_BIG(*) > 100
ORDER BY [Coconuts] DESC;

can you dig it?

What I did was order by the alias of the COUNT_BIG(*) column, Coconuts.

What they didn’t understand was why that’s legal, but filtering on that alias wouldn’t be legal. A more familiar scenario might be using ROW_NUMBER(); you can ORDER BY it, but not filter on it in the WHERE clause to limit result sets to the TOP N per set. You would have to get an intermediate result in a CTE or temp table and then filter.

When SQL goes to figure out what to do with all this, it doesn’t look at it in the order you typed it. It’s a bit more like this:

8. SELECT
9. DISTINCT
11. TOP
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE/ROLLUP
7. HAVING
10. ORDER BY
12. OFFSET/FETCH

To make that a easier to read:

1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE/ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
12. OFFSET/FETCH

and that’s how babies get made

Since the ORDER BY is processed after the SELECT list, ORDER BY can use a column aliased there. You can’t do that in the WHERE clause because it gets processed before SQL does its fancy footwork to get you some stuff to look at.

Here are some examples of what happens when you try to move the alias to different parts of the query.

Using it in the HAVING filter:

SELECT TOP 100
        [b].[Name] ,
        [b].[UserId] ,
        COUNT_BIG(*) AS [Coconuts]
FROM    [dbo].[Badges] AS [b]
WHERE [b].[Date] >= '2014-01-01 00:00:00.003'
GROUP BY [b].[Name] ,
        [b].[UserId]
HAVING  [Coconuts] > 100
ORDER BY [Coconuts] DESC;

And again using it in the WHERE clause:

SELECT TOP 100
        [b].[Name] ,
        [b].[UserId] ,
        COUNT_BIG(*) AS [Coconuts]
FROM    [dbo].[Badges] AS [b]
WHERE [b].[Date] >= '2014-01-01 00:00:00.003'
AND [Coconuts] > 100
GROUP BY [b].[Name] ,
        [b].[UserId]
ORDER BY [Coconuts] DESC;

Both result in the same error, give or take a line number. Coconuts is not reference-able at either of these points.Msg 207, Level 16, State 1, Line 33 Invalid column name 'Coconuts'.

Upcoming classes in Chicago and Portland - join us!

↧

Finding Tables with Nonclustered Primary Keys and no Clustered Index

July 22, 2015, 6:15 am

≫ Next: Logical Query Processing Follow-up

≪ Previous: Logical Query Processing

i’ve seen this happen

Especially if you’ve just inherited a database, or started using a vendor application. This can also be the result of inexperienced developers having free reign over index design.

Unless you’re running regular health checks on your indexes with something like our sp_BlitzIndex® tool, you might not catch immediately that you have a heap of HEAPs in your database.

You may be even further flummoxed upon finding that someone thoughtfully created Primary Keys with Nonclustered Indexes on them, yet no Clustered Indexes. Unless the original developer is still around, the intent may not be clear.

Using this code snippet, you can quickly identify tables that were created with Nonclustered Indexes on the Primary Key, and no Clustered Index. Another way to spot this potential issue might be looking for RID lookups, or Table Scans in your Plan Cache. Wide Nonclustered Indexes may also be present to compensate for the lack of a good Clustered Index.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

SELECT  QUOTENAME(SCHEMA_NAME([t].[schema_id])) + '.' + QUOTENAME([t].[name]) AS [Table] ,
        QUOTENAME(OBJECT_NAME([kc].[object_id])) AS [IndexName] ,
        ( SUM([a].[total_pages]) * 8 / 1024.0 ) AS [IndexSizeMB]
FROM    [sys].[tables] [t]
INNER JOIN [sys].[indexes] [i]
ON      [t].[object_id] = [i].[object_id]
INNER JOIN [sys].[partitions] [p]
ON      [i].[object_id] = [p].[object_id]
        AND [i].[index_id] = [p].[index_id]
INNER JOIN [sys].[allocation_units] [a]
ON      [p].[partition_id] = [a].[container_id]
INNER JOIN [sys].[key_constraints] AS [kc]
ON      [t].[object_id] = [kc].[parent_object_id]
WHERE   (
          [i].[name] IS NOT NULL
          AND OBJECTPROPERTY([kc].[object_id], 'CnstIsNonclustKey') = 1 --Unique Constraint or Primary Key can qualify
          AND OBJECTPROPERTY([t].[object_id], 'TableHasClustIndex') = 0 --Make sure there's no Clustered Index, this is a valid design choice
          AND OBJECTPROPERTY([t].[object_id], 'TableHasPrimaryKey') = 1 --Make sure it has a Primary Key and it's not just a Unique Constraint
          AND OBJECTPROPERTY([t].[object_id], 'IsUserTable') = 1 --Make sure it's a user table because whatever, why not? We've come this far
        )
GROUP BY [t].[schema_id] ,
        [t].[name] ,
        OBJECT_NAME([kc].[object_id])
ORDER BY SUM([a].[total_pages]) * 8 / 1024.0 DESC;

There are times when heaps are a valid choice

ETL or staging tables are the most common examples of when raw insert throughput is necessary, and a Clustered Index may not be.

But when tables are designed to be queried against and lack a Clustered Index, it’s usually a problem to be fixed.

Brent says: and remember, kids, SQL Server won’t suggest a clustered index.

Kendra says: if you think this probably didn’t happen to you, that’s a good sign you should doublecheck.

Upcoming classes in Chicago and Portland - join us!

↧

Logical Query Processing Follow-up

July 29, 2015, 8:35 am

≫ Next: Getting the last good DBCC CHECKDB date

≪ Previous: Finding Tables with Nonclustered Primary Keys and no Clustered Index

i like questions!

Because I don’t always like talking to myself, and robots are sometimes murderous and insane. So when this one came up in the comments of my previous post, I thought it would make a good follow-up blog. It’s a really good question, and definitely one I found myself asking quite a bit when I first started writing queries.

I hope you like big words, veljasije!

the fancy pants answer

Is that this isn’t a processing order issue, but rather a set processing issue. When you group by two columns, the resulting set makes ordering by a column outside of that set ambiguous.

Huh?

Think about it. If two columns of values are spread out over a range of unique IDs in a third column, which ID belongs to which grouped set if you group by the two other columns? The highest ID? The lowest ID? Something in between?

demonstrate my syntax

A quick example using a take on FizzBuzz. We’ll call it NYHCBuzz.

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT  
        IDENTITY (BIGINT, 1,1) AS [ID] , 
        ISNULL(CONVERT(DATE, DATEADD(MINUTE, -[N].[N], GETDATE())),     '1900-01-01') AS [SomeDate] ,
        CASE WHEN n.[N] % 15 = 0 THEN 'Sheer Terror'
             WHEN n.[N] % 5 = 0 THEN 'Madball'
             WHEN n.[N] % 3 = 0 THEN 'Skarhead'
             ELSE 'Warzone' END AS [NYHCBuzz]
INTO [LowerEastSide]
FROM    [Numbers] [N]
ORDER BY [N] DESC;

SELECT * 
FROM [dbo].[LowerEastSide] AS [les]
ORDER BY [les].[ID];

And here’s a little snippet of what you end up with. This hopefully makes my point a bit more clear. You can’t order by both ID 11 and 26, for example. They would be part of the same group, and they’re buried in a range with hundreds of other values.

Ordering by the ID column wouldn’t make sense. Which ID would each grouped set be associated with?

Who knows? Not even SQL.

I like this playlist.

Running this will get you a big fat error.

SELECT [les].[SomeDate], [les].[NYHCBuzz], COUNT_BIG(*) AS [Records]
FROM [dbo].[LowerEastSide] AS [les]
GROUP BY [les].[SomeDate], [les].[NYHCBuzz]
ORDER BY [les].[ID]

Msg 8127, Level 16, State 1, Line 26 Column "dbo.LowerEastSide.ID" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.

So what can you do? You can’t group by the ID column, it’s a unique value per row. You’ll just get every row and a count of 1. That’s dumb and ugly.

You can work around it by telling SQL how to order within each group, with a query like this.

SELECT [les].[SomeDate], [les].[NYHCBuzz], COUNT_BIG(*) AS [Records]
FROM [dbo].[LowerEastSide] AS [les]
GROUP BY [les].[SomeDate], [les].[NYHCBuzz]
ORDER BY MIN([les].[ID]) --Look at me

Adding the MIN() function to the ORDER BY tells SQL to take the lowest ID per grouped set for ordering. If you’re curious as to how that helps, look at the results with this query.

SELECT MIN([les].[ID]) AS [MinID], [les].[SomeDate], [les].[NYHCBuzz], COUNT_BIG(*) AS [Records]
FROM [dbo].[LowerEastSide] AS [les]
GROUP BY [les].[SomeDate], [les].[NYHCBuzz]
ORDER BY MIN([les].[ID])

You can see pretty easily what SQL is doing here. For every grouped set of dates and names, you’re also grabbing the minimum ID value. The aggregation works to provide each set with a related value to also order by.

I would go to this show.

You can use any aggregate function of your choosing, really. I don’t know if any of these have a good business use case, but taking a look at the changes to both output ordering and ID column values is instructive when it comes to learning exactly how the aggregation is necessary to provide order to grouped sets.

SELECT MIN([les].[ID]) AS [MinID], [les].[SomeDate], [les].[NYHCBuzz], COUNT_BIG(*) AS [Records]
FROM [dbo].[LowerEastSide] AS [les]
GROUP BY [les].[SomeDate], [les].[NYHCBuzz]
ORDER BY MIN([les].[ID])

SELECT MAX([les].[ID]) AS [MaxID], [les].[SomeDate], [les].[NYHCBuzz], COUNT_BIG(*) AS [Records]
FROM [dbo].[LowerEastSide] AS [les]
GROUP BY [les].[SomeDate], [les].[NYHCBuzz]
ORDER BY MAX([les].[ID])

SELECT AVG([les].[ID]) AS [AvgID], [les].[SomeDate], [les].[NYHCBuzz], COUNT_BIG(*) AS [Records]
FROM [dbo].[LowerEastSide] AS [les]
GROUP BY [les].[SomeDate], [les].[NYHCBuzz]
ORDER BY AVG([les].[ID])

SELECT SUM([les].[ID]) AS [SumID], [les].[SomeDate], [les].[NYHCBuzz], COUNT_BIG(*) AS [Records]
FROM [dbo].[LowerEastSide] AS [les]
GROUP BY [les].[SomeDate], [les].[NYHCBuzz]
ORDER BY SUM([les].[ID])

Just can’t aggregate enough.

We cover all sorts of cool stuff like this (and more!) at Advanced Querying and Indexing. There’s even food. And prizes. And almost no one ever self-immolates, so that’s a big plus.

Upcoming classes in Chicago and Portland - join us!

↧

Getting the last good DBCC CHECKDB date

August 5, 2015, 5:45 am

≫ Next: Clustered Index key columns in Nonclustered Indexes

≪ Previous: Logical Query Processing Follow-up

whether it’s a new job or a new (old) server

If you’re reading this, you’ve at some point in your career stared at a server, for the first time, with great trepidation.

The smarter you are, the greater your trepidation is. The 2nd century mathematician Trepidatius the Wimpy had an equation that described this, but he only applied it to leaving his hut.

So the first thing you check is backups. Miraculously, someone is at least taking FULL backups. The logs and diffs are another story, but that’s why you’re getting paid. If your DBA checklist looks like mine, the next box down is seeing if someone has ever run DBCC CHECKDB to find corruption.

BUT HOW?

Since it’s my favorite test data set, I’ll use the StackOverflow database.

Ready?

DBCC DBINFO('StackOverflow') WITH TABLERESULTS

That’s it. But the output is a nightmare. It’s about 80 lines of stuff you will probably never care about. Around line 50 is what you’re looking for.

Hi, I’m nonsense.

And this is probably what you’ll see! A date of 1900-01-01 etc. That means never. If you run DBCC CHECKDB on the database, perhaps like so:

DBCC CHECKDB('StackOverflow') WITH NO_INFOMSGS, ALL_ERRORMSGS

And then re-run the DBCC DBINFO command, our date is now updated to current:

LOOK HOW MUCH FUN WE’RE HAVING

IS THIS THE ONLY WAY?

Of course not. But if you need a quick solution, there it is. The only catch is that it will update if you run your DBCC CHECKDB with PHYSICAL_ONLY set. Using that option skips the logical consistency checks that a full run of DBCC CHECKDB does.

If you’re a smarty pants, and you’re using Ola Hallengren’s maintenance scripts, you can check the CommandLog table it creates to [drumroll] log commands, see when DBCC CHECKDB was last run, and even how long it took.

If you’re doing something else, some guy named Brent wrote a stored procedure called sp_Blitz® that will tell you if any of the databases on your server have not had a consistency check run in the last two weeks. It will also tell you everything else wrong with that new server. It was one of my favorite things in the world, back when I had a real job.

If you like this stuff, and you want to get better at it, we’ll show you how! Join us for our upcoming Senior DBA Class in crime-free Chicago.

Kendra says: Ever been confused by those weird messages about CHECKDB in the SQL Server log when your instance starts up, but sometimes it might show a really old date? Fun fact: it’s actually looking up last CHECKDB run date for the database.

↧

Clustered Index key columns in Nonclustered Indexes

August 12, 2015, 6:15 am

≫ Next: Performance Benefits of Unique Indexes

≪ Previous: Getting the last good DBCC CHECKDB date

clustered indexes are fundamental

And I’m not just saying that because Kendra is my spiritual adviser!

They are not ~a copy~ of the table, they are the table, ordered by the column(s) you choose as the key. It could be one. It could be a few. It could be a GUID! But that’s for another time. A long time from now. When I’ve raised an army, in accordance with ancient prophecy.

What I’d like to focus on is another oft-neglected consideration when indexing:

columns in the clustering key will be in all of your nonclustered indexes

“How do you know?”
“Can I join your army?”
“Why does it do that?”

You may be asking yourself all of those questions. I’ll answer one of them with a demo, and one of them with “no”.

D-D-D-DEMO FACE

USE [tempdb]

IF OBJECT_ID('tempdb..ClusterKeyColumnsTest') IS NOT NULL
DROP TABLE [tempdb]..[ClusterKeyColumnsTest]

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT  
        IDENTITY (BIGINT, 1,1) AS [ID] ,  
        ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE()    )),  '1900-01-01') AS [OrderDate] ,
        ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE() + 1)), '1900-01-01') AS [ProcessDate] ,
        ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE() + 3)), '1900-01-01') AS [ShipDate] ,
        REPLICATE(CAST(NEWID() AS NVARCHAR(MAX)), CEILING(RAND() * 10)) AS [DumbGUID] 
INTO [ClusterKeyColumnsTest]
FROM    [Numbers] [N]
ORDER BY [N] DESC;

ALTER TABLE [ClusterKeyColumnsTest] ADD CONSTRAINT [PK_ClusterKeyColumnsTest] PRIMARY KEY CLUSTERED ([ID]) WITH (FILLFACTOR = 100)

CREATE NONCLUSTERED INDEX [IX_ALLDATES] ON [dbo].[ClusterKeyColumnsTest] ([OrderDate], [ProcessDate], [ShipDate]) WITH (FILLFACTOR = 100)

That code will create a pretty rudimentary table of random data. It was between 3-4 GB on my system.

You can see a Clustered Index was created on the ID column, and a Nonclustered Index was created on the three date columns. The DumbGUID column was, of course, neglected as a key column in both. Poor GUID.

Running these queries, the Nonclustered Index on the date columns will be used, because SQL does this smart thing where it takes page count into consideration when choosing an index to use.

SELECT [ckct].[ID]
FROM [dbo].[ClusterKeyColumnsTest] AS [ckct]

SELECT [ckct].[ID]
FROM [dbo].[ClusterKeyColumnsTest] AS [ckct]
WHERE [ckct].[OrderDate] = '2014-12-18'

SELECT [ckct].[ID], [ckct].[DumbGUID]
FROM [dbo].[ClusterKeyColumnsTest] AS [ckct]
WHERE [ckct].[OrderDate] = '2014-12-18'

Notice that the only time a Key Lookup was needed is for the last query, where we also select the DumbGUID column. You’d think it would be needed in all three, since the ID column isn’t explicitly named in the key or as an include in the Nonclustered Index.

Key Lookups really are really expensive and you should really fix them. Really.

sp_BlitzIndex® to the rescue

If you find yourself trying to figure out indexes, Kendra’s sp_BlitzIndex® stored procedure is invaluable. It can do this cool thing where it shows you SECRET columns!

Since I’ve already ruined the surprise, let’s look at the indexes on our test table.

EXEC [master].[dbo].[sp_BlitzIndex]
    @DatabaseName = N'tempdb' , 
    @SchemaName = N'dbo' , 
    @TableName = N'ClusterKeyColumnsTest'

Here’s the output. The detail it gives you on index columns and datatypes is really awesome. You can see the ID column is part of the Nonclustered Index, even though it isn’t named in the definition.

Ooh, shapes!

one step beyond

Run the code to drop and recreate our test table, but this time add these indexes below instead of the original ones.

ALTER TABLE [ClusterKeyColumnsTest] ADD CONSTRAINT [PK_[ClusterKeyColumnsTest] PRIMARY KEY CLUSTERED ([ID], [OrderDate]) WITH (FILLFACTOR = 100)

CREATE NONCLUSTERED INDEX [IX_ALLDATES] ON [dbo].[ClusterKeyColumnsTest] ([ProcessDate], [ShipDate]) 
INCLUDE([DumbGUID])
WITH (FILLFACTOR = 100)

Running the same three queries as before, our plans change only slightly. The Key Lookup is gone, and the statement cost per batch has evened out.

You too can be a hero by getting rid of Key Lookups.

But notice that, for the second query, where we’re searching on the OrderDate column, we’re still scanning the Nonclustered Index.

We moved that out of the Nonclustered Index and used it as part of the Clustering Key the second time around. What gives?

Running sp_BlitzIndex® the same as before, OrderDate is now a secret column in the Nonclustered Index.

Fun Boy Three does the best version of Our Lips are Sealed, BTW.

did we learn anything?

Sure did!

SQL ‘hides’ the columns from the Key of the Clustered Index in Nonclustered Indexes
Since those columns are part of the index, you don’t need to include them in the definition
Secret columns in Nonclustered Indexes can be used to avoid keylookups AND satisfy WHERE clause searches!

Well, at least until all indexes are ColumnStore Indexes.

Kendra says: One of the most common questions I get is whether there’s a penalty or downside if you list the columns from the clustered index in your nonclustered index. You can stop worrying: there’s no penalty.

↧

Performance Benefits of Unique Indexes

August 18, 2015, 6:15 am

≫ Next: Is leading an index with a BIT column always bad?

≪ Previous: Clustered Index key columns in Nonclustered Indexes

sql server loves unique indexes

Why? Because it’s lazy. Just like you. If you had to spend all day flipping pages around, you’d probably be even lazier. Thank Codd someone figured out how to make a computer do it. There’s some code below, along with some screen shots, but…

TL;DR

SQL is generally pretty happy to get good information about the data it’s holding onto for you. If you know something will be unique, let it know. It will make better plan choices, and certain operations will be supported more efficiently than if you make it futz around looking for repeats in unique data.

There is some impact on inserts and updates as the constraint is checked, but generally it’s negligible, especially when compared to the performance gains you can get from select queries.

So, without further ado!

Q: What was the last thing the Medic said to the Heavy?

A: Demoooooooooo!

We’ll start off by creating four tables. Two with unique clustered indexes, and two with non-unique clustered indexes, that are half the size. I’m just going with simple joins here, since they seem like a pretty approachable subject to most people who are writing queries and creating indexes. I hope.

USE [tempdb]

/*
The drivers
*/
;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT 
ISNULL([N].[N], 0) AS ID,
ISNULL(CONVERT(DATE, DATEADD(SECOND, [N].[N], GETDATE())),     '1900-01-01') AS [OrderDate],
ISNULL(SUBSTRING(CONVERT(VARCHAR(255), NEWID()), 0, 9) , 'AAAAAAAA') AS [PO]
INTO UniqueCL
FROM [Numbers] N

ALTER TABLE UniqueCL ADD CONSTRAINT [PK_UniqueCL] PRIMARY KEY CLUSTERED ([ID]) WITH (FILLFACTOR = 100)


;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT 
ISNULL([N].[N], 0) AS ID,
ISNULL(CONVERT(DATE, DATEADD(SECOND, [N].[N], GETDATE())),     '1900-01-01') AS [OrderDate],
ISNULL(SUBSTRING(CONVERT(VARCHAR(255), NEWID()), 0, 9) , 'AAAAAAAA') AS [PO]
INTO NonUniqueCL
FROM [Numbers] N

CREATE CLUSTERED INDEX [CLIX_NonUnique] ON dbo.NonUniqueCL ([ID]) WITH (FILLFACTOR = 100)


/*
The joiners
*/

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT 
ISNULL([N].[N], 0) AS ID,
ISNULL(CONVERT(DATE, DATEADD(SECOND, [N].[N], GETDATE())),     '1900-01-01') AS [OrderDate],
ISNULL(SUBSTRING(CONVERT(VARCHAR(255), NEWID()), 0, 9) , 'AAAAAAAA') AS [PO]
INTO UniqueJoin
FROM [Numbers] N
WHERE [N] < 5000001

ALTER TABLE UniqueJoin ADD CONSTRAINT [PK_UniqueJoin] PRIMARY KEY CLUSTERED ([ID], [OrderDate]) WITH (FILLFACTOR = 100)

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT 
ISNULL([N].[N], 0) AS ID,
ISNULL(CONVERT(DATE, DATEADD(SECOND, [N].[N], GETDATE())),     '1900-01-01') AS [OrderDate],
ISNULL(SUBSTRING(CONVERT(VARCHAR(255), NEWID()), 0, 9) , 'AAAAAAAA') AS [PO]
INTO NonUniqueJoin
FROM [Numbers] N
WHERE [N] < 5000001

CREATE CLUSTERED INDEX [CLIX_NonUnique] ON dbo.NonUniqueJoin ([ID], [OrderDate]) WITH (FILLFACTOR = 100)

Now that we have our setup, let’s look at a couple queries. I’ll be returning the results to a variable so we don’t sit around waiting for SSMS to display a bunch of uselessness.

DECLARE @ID BIGINT;

SELECT  @ID = [uc].[ID]
FROM    [dbo].[UniqueCL] AS [uc]
JOIN    [dbo].[UniqueJoin] AS [uj]
ON      [uj].[ID] = [uc].[ID]
WHERE   [uc].[ID] % 2 = 0
ORDER BY [uc].[ID];

GO  

DECLARE @ID BIGINT;

SELECT  @ID = [nuc].[ID]
FROM    [dbo].[NonUniqueCL] AS [nuc]
JOIN    [dbo].[NonUniqueJoin] AS [nuj]
ON      [nuj].[ID] = [nuc].[ID]
WHERE   [nuc].[ID] % 2 = 0
ORDER BY [nuc].[ID];

GO

What does SQL do with these?

Ugly as a river dolphin, that one.

Not only does the query for the unique indexes choose a much nicer merge join, it doesn’t even get considered for ~~parallelilzazation~~ going parallel. The batch cost is about 1/3, and the sort is fully supported.

The query against non-unique tables requires a sizable memory grant, to boot.

Looking at the STATISTICS TIME and IO output, there’s not much difference in logical reads, but you see the non-unique index used all four cores available on my laptop (4 scans, 1 coordinator thread), and there’s a worktable and workfile for the hash join. Overall CPU time is much higher, though there’s only ever about 100ms difference in elapsed time over a number of consecutive runs.

Table 'UniqueJoin'. Scan count 1, logical reads 3969, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UniqueCL'. Scan count 1, logical reads 3968, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 264 ms.

Table 'NonUniqueCL'. Scan count 5, logical reads 4264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'NonUniqueJoin'. Scan count 5, logical reads 4264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1186 ms, elapsed time = 353 ms.

fair fight

So, obviously going parallel threw some funk on the floor. If we force a MAXDOP of one to the non-unique query, what happens?

You Get Nothing! You Lose! Good Day, Sir!

Yep. Same thing, just single threaded this time. The plan looks a little nicer, sure, but now the non-unique part is up to 85% of the batch cost, from, you know, that other number. You’re not gonna make me say it. This is a family-friendly blog.

Going back to TIME and IO, the only noticeable change is in CPU time for the non-unique query. Still needed a memory grant, still has an expensive sort.

Table 'UniqueJoin'. Scan count 1, logical reads 3969, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UniqueCL'. Scan count 1, logical reads 3968, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 265 ms, elapsed time = 264 ms.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'NonUniqueJoin'. Scan count 1, logical reads 4218, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'NonUniqueCL'. Scan count 1, logical reads 4218, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 766 ms, elapsed time = 807 ms.

just one index

The nice thing is that a little uniqueness goes a long way. If we join the unique table to the non-unique join table, we end up with nearly identical plans.

You’re such a special flower.

Table 'UniqueJoin'. Scan count 1, logical reads 3969, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UniqueCL'. Scan count 1, logical reads 3968, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 265 ms,  elapsed time = 267 ms.

Table 'NonUniqueJoin'. Scan count 1, logical reads 4218, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'UniqueCL'. Scan count 1, logical reads 3968, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 266 ms,  elapsed time = 263 ms.

done and done

So, you made it to the end. Congratulations. I hope your boss didn’t walk by too many times.

By the way, the year is 2050, the Cubs still haven’t won the world series, and a horrible race of extraterrestrials have taken over the Earth and are using humans as slaves to mine gold. Wait, no, that’s something else.

But! Hey! Brains! You have more of them now, if any of this was enlightening to you. If you spaced out and just realized the page stopped scrolling, here’s a recap:

Unique indexes: SQL likes’em
You will generally see better plans when the optimizer isn’t concerned with duplicate values
There’s not a ton of downside to using them where possible
Even one unique index can make a lot of difference, when joined with a non-unique index.

As an aside, this was all tested on SQL Server 2014. An exercise for Dear Reader; if you have SQL Server 2012, look at the tempdb spills that occur on the sort and hash operations for the non-unique indexes. I’m not including them here because it’s a bit of a detour. It’s probably not the most compelling reason to upgrade, but it’s something to consider — tempdb is way less eager to write to disk these days!

Thanks for reading!

Brent says: I always wanted proof that unique clustered indexes made for better execution plans!

↧

Is leading an index with a BIT column always bad?

August 26, 2015, 6:00 am

≫ Next: Why You Should Apply (Take It From The New Guy)

≪ Previous: Performance Benefits of Unique Indexes

“Throughout history, slow queries are the normal condition of man. Indexes which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people who don’t think bit columns are selective enough to lead index keys. Whenever this tiny minority is kept from creating indexes, or (as sometimes happens) is driven out of a SCRUM meeting, the end users then slip back into abject query performance.

This is known as “Business Intelligence.”

–Bobby Q. Heinekens

Fake quotes and people aside

Let’s look at a scenario where you have a BIT column that’s fairly selective, and perhaps the rest of your predicates are ranged. This isn’t so out of the ordinary, especially because people like to know when stuff happened and how many times it happened.

“How many lunches have I eaten today?”

“Where did that bear learn to drive?”

“Am I being slowly disassembled on a molecular level by millions of tiny black holes?”

Yes, China or Florida, and Probably!

So let’s figure this out quick

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT  
        IDENTITY (BIGINT, 1,1) AS [ID] , 
    ABS(CHECKSUM(NEWID()) / 100000000) + 1 AS [CustomerID],
        ISNULL(CONVERT(DATE, DATEADD(MINUTE, -[N].[N], GETDATE())),     '1900-01-01') AS [OrderDate] ,
        CASE WHEN [N].[N] % 19 = 0 THEN 1 ELSE 0 END AS [isBit]
INTO [NotAshleyMadisonData]
FROM    [Numbers] [N]
ORDER BY [OrderDate]
;

ALTER TABLE [NotAshleyMadisonData] ADD CONSTRAINT [PK_NotAshleyMadisonData] PRIMARY KEY CLUSTERED ([ID]) WITH (FILLFACTOR = 100)

CREATE NONCLUSTERED INDEX [IX_BITFIRST] ON [dbo].[NotAshleyMadisonData]
([isBit], [OrderDate])

CREATE NONCLUSTERED INDEX [IX_DATEFIRST] ON [dbo].[NotAshleyMadisonData]
([OrderDate], [isBit])

Here’s one table with one million rows in it. Since it’s random, if you run this on your own it may turn out a little different for you, but I’m sure you can adapt. You are, after all, wearing the largest available diaper size.

I’ve also gone ahead and created two indexes (neither one filtered!) to avoid the appearance of impropriety. The first one goes against the oft-chanted mantra of not leading your index with a BIT column. The other complies to your thumb-addled rules of index creation where your more unique column comes first, though not to an opposing rule to lead your index with equality predicates and then range predicates.

Only 52,631 rows out of a million have a BIT value of 1. And with the exception of the first and last date values, each date has 75 or 76 BIT = 1 columns.

If you had to do this in your head, which would you do first? Find all the BIT = 1 rows, and then only count occurrences from the desired range? Or would you Find your start and end dates and then count all the BIT = 1 values?

(Hint: it doesn’t matter, you’re not the query engine. Unless you’re Paul White. Then maybe you are. Has anyone seen them in the same room together?)

Images of Query Plans

SELECT COUNT_BIG(*) AS [Records]
FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[isBit] = 1
AND [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2015-08-20' --All dates

SELECT COUNT_BIG(*) AS [Records]
FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[isBit] = 1
AND [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2014-09-01' --About half the dates

SELECT COUNT_BIG(*) AS [Records]
FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[OrderDate] BETWEEN '2013-09-25'	AND	 '2014-09-01'
AND [namd].[isBit] = 1 -- Flipping them doesn't change anything

SELECT COUNT_BIG(*) AS [Records]
FROM [dbo].[NotAshleyMadisonData] AS [namd]
WHERE [namd].[OrderDate] = '2013-09-26' --It's not until here that the other index gets used
AND [namd].[isBit] = 1

Peter Godwin will never play my birthday party.

Put on your recap cap

This is another case where knowing the data, and knowing the query patterns in your environment is important. It’s easy to overlook or reject obvious things when you’re bogged down by dogma.

The stamp of approval for an idea shouldn’t come from blogs, forums, white papers, or hash tags. It should come from how much something helps in your environment.

Thanks for reading!

Version 2.0 of our online training site is out, and we're celebrating this week with daily deals.

↧

Why You Should Apply (Take It From The New Guy)

September 8, 2015, 6:00 am

≫ Next: Window Functions and Cruel Defaults

≪ Previous: Is leading an index with a BIT column always bad?

Personal Benchmark

I’ve always thought that I’d know I have the right job when, if I were to win a dynastic sum of money from the lottery, or inherit it from a long-lost relative, I would still show up for work every day.

This is that job.

Seven months ago, when I saw that the Brent Ozar team was looking to expand, I said to my wife “hey, those people who sent us a Christmas card are hiring”. Her response was something along the lines of “will you still have to work every weekend?”

I haven’t had to work a single weekend. So if that’s your thing, you can probably stop reading. Weirdo.

It was a no-brainer for me. I love SQL, I love learning about technology in general, I love the style of the company, and growing up with a teacher for a mom had given me an itch to teach, just not to a room full of kids. Though sometimes a room full of developers isn’t too far off.

Erik at the company retreat

I kid, I kid! Developers are my people.

My special people.

Self-assessment

Before applying, I thought to myself, “someone way more qualified than me is filling out the form right now, and I don’t stand a chance.”

Three interviews and some paperwork later, I was putting in my notice, and singing “I’ve got a golden ticket” while skipping home through the Financial District to our Chinatown apartment with a bag full of champagne. That’s about when I started paying attention to what color traffic lights are before crossing the street.

My point is, it’s not just about being someone who has memorized a bunch of know-it-all facts, who can recite them chapter and verse to a WebEx full of yawning stakeholders and tech people who fear for their jobs. It’s about being comfortable enough with your level of knowledge to be able to present facts to people in a way that will improve their situation.

Not everyone needs multiple secondary, geo-synchronous Availability Groups on FCIs, across every tectonic plate, replicated to secret underground bunkers on SANs the size of Rhode Island (or a real state).

To most people, SQL Server is still that shadow in the dark that makes them keep a bottle of hooch in their desk drawer. They just want their apps to work.

Cash 4 Brains

I had never had anyone senior to me when I was working with SQL in my previous jobs. I had bosses and managers, sure, but I was always a direct report. Needless to say, I wasted a lot of time dropping dumb bombs. I would have killed (figuratively, of course) for the opportunity to sit in a chat room with these people and be able to ask them questions. All day. Every day. With gifs. And custom emoji.

Well, you get the point. Even if you forget about the amazing benefits, becoming a cartoon, getting to work from home and all that, you get to show up to work and learn from some of the funniest, smartest, most down to earth people in the SQL community.

You’re probably wondering if there are any downsides.

Spoiler: you have to buy your own coffee.

We're expanding again! This time we're looking for a SQL Server Triage Specialist.

↧

Window Functions and Cruel Defaults

September 29, 2015, 6:00 am

≫ Next: SQL Server Features I’d Like To See, Oracle Edition

≪ Previous: Why You Should Apply (Take It From The New Guy)

My First Post Here…

Well, my first technical post, was about how the default index creation method is OFFLINE. If you want that sweet, sweet Enterpri$e Edition ONLINE goodness, you need to specify it. It’s been a while since that one; almost six months to the day. So here’s another one!

But Window Functions Are Awesome

Heck yeah they are. And how. Boy howdy. Etc. You get the point. I’m enthusiastic. What can be cruel about them? Glad you asked!

Window Functions, according to the almighty ANSI Standard, have two ways of framing data: RANGE and ROWS. Without getting into the implementation differences between the ANSI Standard and Microsoft’s versions, or any performance differences between the two, there’s a funny difference in how they handle aggregations when ordered by non-unique values. A simple example using the Stack Overflow database follows.

SELECT  [OwnerUserId] ,
        CAST([CreationDate] AS DATE) AS [DumbedDownDate] ,
        [Score] ,
        SUM([Score]) OVER ( ORDER BY CAST([CreationDate] AS DATE) ) AS [Not_Specified] ,
        SUM([Score]) OVER ( ORDER BY CAST([CreationDate] AS DATE) RANGE UNBOUNDED PRECEDING ) AS [Range_Specified] ,
        SUM([Score]) OVER ( ORDER BY CAST([CreationDate] AS DATE) ROWS UNBOUNDED PRECEDING ) AS [Rows_Specified]
FROM    [dbo].[Posts]
WHERE   [OwnerUserId] = 1
        AND CAST([CreationDate] AS DATE) BETWEEN '2008-08-01'
                                         AND     '2008-08-31'
ORDER BY CAST([CreationDate] AS DATE);

For the month of August, Year of Our Codd 2008, we’re getting a running total of the score for posts by UserId 1. Who is UserId 1? I’ll never tell. But back to the syntax! In the first SUM, we’re not specifying anything, for the next two we’re specifying RANGE and then ROWS. Why? REASONS! And why am I casting the CreateDate column as a date? MORE REASONS!

Before you scroll down, think for a second:

If I don’t specify RANGE or ROWS, which will SQL Server use?
If I left the CreateDate column as DATETIME, what ~~eff~~ ~~aff~~ difference would it make to the output?

Do you see a pattern forming here?

OH MY GOD IT WORKED

When we don’t specify RANGE or ROWS, well, SQL Server is nice enough to pick RANGE for us. “Nice”.

Whose fault? Default!

Deep breaths, Erik. Deep breaths.

You should also notice the difference in how each different method aggregates data. When the ordering column has duplicates, RANGE, and by extension, the default method, will SUM all the values for that group at once. When ROWS is specified as the framing method, you see the running total that most people are after.

Make project managers happy!

And, of course, if all the values were unique, they’d do the same thing.

SELECT  [OwnerUserId] ,
        [CreationDate] ,
        [Score] ,
        SUM([Score]) OVER ( ORDER BY [CreationDate] ) AS [Not_Specified] ,
        SUM([Score]) OVER ( ORDER BY [CreationDate]  RANGE UNBOUNDED PRECEDING ) AS [Range_Specified] ,
        SUM([Score]) OVER ( ORDER BY [CreationDate]  ROWS UNBOUNDED PRECEDING ) AS [Rows_Specified]
FROM    [dbo].[Posts]
WHERE   [OwnerUserId] = 1
        AND CAST([CreationDate] AS DATE) BETWEEN '2008-08-01'
                                         AND     '2008-08-31'
ORDER BY [CreationDate];

Back for a day

Wrap. It. Up.

This one is pretty self explanatory. If you’re lucky enough to be on SQL Server 2012 or greater, and you’re using Window Functions to their full T-SQL potential, it’s was easier to calculate running totals. Just be careful how you write your code.

If you like this sort of stuff, Check out Doug’s new video series, T-SQL Level Up. There are next to zero fart jokes in it.

Register now for our upcoming free webcasts: migrations, architecture, index maintenance, and Office Hours.

↧

SQL Server Features I’d Like To See, Oracle Edition

October 14, 2015, 6:15 am

≫ Next: SQL Server Features I’d Like To See, PostgreSQL Edition

≪ Previous: Window Functions and Cruel Defaults

BUT FRANCE HAS A PONY

I really like SQL Server. Most of the time. Okay, so most of the time I like SQL Server most of the time. Don’t get me wrong, if I had to go back through the career-time continuum and pick a RDBMS to work with, I’d probably still choose it over Oracle. Probably. And, because I don’t exclusively grow facial hair from my neck, I wouldn’t be allowed to choose PostgreSQL. They’d kick me off the mailing list.

Just kidding. You’re all handsome rogues. We could have had a nice life together, staring longingly into each other’s shoes and trying to implement parallelism.

I’d have DB2 here, but the cost of entry to the Developer Edition is rather steep. So, you know, I’m sure it’s great! But no. Though I would be really happy if Microsoft implemented ANSI Standard constructs into T-SQL half as fast as IBM does.

I have poked at Oracle and PostgreSQL a bit, and found they have some really cool stuff. Heresy, right?

Check out some of these Oracle gadgets and tell me they wouldn’t make your life a whole lot easier.

In no particular order:

Table restores! Built in! I’m very surprised we never got a feature like this. You can do it with a 3rd party tool like Dell LiteSpeed.

Adaptive Plans! Go to the link and read the second paragraph. Read it twice. Wipe the drool off your face.

In-Database Row Archiving! You know all that stuff you do with partitions that Oracle already does better? Where you’re basically praying for partition elimination to not undo the two weeks of work you put in to setting up this partitioned table that developers are writing horrible MERGE upserts to? Yeah. You can just tell the engine to not pay attention to rows you don’t care about anymore when it accesses the index. Fancy that.

Bitmap Indexes! It’s kind of like a filtered index, except for all values of a highly non-selective column.

Materializing CTEs! Even though it’s undocumented, we use plenty of undocumented stuff in SQL Server to get the job done. This is really cool to me, since I’ve discussed this limitation in CTEs before. I’d love to see a way to do this in SQL with the same behavior; not having to create temp tables. It would be a nice way to get around issues with caching statistics for temp tables, and especially since MS is still fixing bugs around temp tables.

Are there more? Yeah, but this is a blog. Go grab a VirtualBox and read the documentation if you’re interested in learning more.

Register now for our upcoming free webcasts: migrations, architecture, index maintenance, and Office Hours.

↧

SQL Server Features I’d Like To See, PostgreSQL Edition

October 27, 2015, 6:15 am

≫ Next: The case for Query Store in tempdb

≪ Previous: SQL Server Features I’d Like To See, Oracle Edition

BUT FRANCE HAS A FREE PONY

That’s right. PostgreSQL is basically free. The MIT license is like the Church of England of licenses. They do not care what you do.

But I care what they do! If you read the first part of this article, you saw some Oracle features that I wish SQL Server had. Over here, I’ll be talking about some cool stuff PG does that SQL doesn’t do. At least not ‘natively’. There are workarounds, but we’re still getting some circles run around us.

So here goes!

Unlogged Tables! Forget wrestling with minimal logging, which sometimes just doesn’t work, no matter which Trace Flags and hints you throw at it. You can just tell the engine you don’t care about this table and it won’t log any transactions for it. Yeah, put that in your ETL and smoke it.

Generate_series! This is one of those things that I used, and then spent hours playing with. You know all that crazy stuff we do with Tally Tables and Date tables? Yeah, PG users just use generate_series, and it spits out the range of values they want. It works with numbers and dates, and even handles intervals. It’s pretty wonderful. Sick burn.

Arrays! Between array_agg and UNNEST, PG offers simple ways to handle a couple things that SQL Server users spend a lot of time trying to hack together T-SQL and CLR methods to deal with. If you’ve ever been mortified and confused by those SELECT STUFF FOR XML PATH shenanigans we use to create lists/arrays, or read one of the bounty of articles and arguments about splitting strings in SQL Server, you’d probably buy features like this a steak dinner.

BRIN Indexes! As a guy who has dealt with a lot of large tables, and large indexes, something like this is really awesome. A Block Range Index basically stores high and low keys for each, you guessed it, Block Range within the index, and decides which blocks it needs based on those values. The upside is that indexes of this type are TINY compared to traditional indexes. It’s a lot more like scanning the statistics histogram to figure out which steps you need, and then retrieving those steps.

Multiple language stored procedures! Yep. Write a stored procedure in a language you’re comfortable with. Tell PG what the language is at the end of the stored proc, and it will use it. If you’ve ever used CLR, and struggled with .dlls and trustworthy and blah blah blah, this probably sounds like a dream.

MINDOP! Just kidding. But I do wish we had this. It’d be way more useful than, like, Service Broker, or Resource Governor, or Affinity Masking, or Priority Boost, or… I’m getting carried away. Apologies; Parallelism riles me.

Anyway, I hope you enjoyed these, and I hope that you’ll give other database systems a look. There’s some pretty cool stuff out there.

Know somebody who deserves a leg up? Tell them about our SQL Server Training Scholarship Program.

↧