This Isn’t A Trick Question
Hopefully it’ll get you thinking about your tables, and how they’re designed. One of the most consistent problems I see with clients is around wide tables.
I don’t mean data types, I mean the number of columns.
Going back to Michael Swart’s 10% Rule, if your tables have > 100 columns in them, you’re likely going to run into trouble.
What Makes Them Bad?
They’re nearly impossible to index efficiently:
- Queries will hit them in many different ways
- WHERE clauses will be unpredictable
- SELECT lists will vary wildly
When indexes pile up to support all these different queries, locking and blocking will start to become larger issues.
While some of them can be solved with optimistic isolation levels, writer on writer conflicts are really tough to avoid.
First Sign Of Problems: Prefixed Columns
Do you have columns with similar prefixes?
If you have naming patterns like this, it’s time to look at splitting those columns out.
I took the Users and Posts tables from Stack Overflow and mangled them a bit to look like this.
You may not have tables with this explicit arrangement, but it could be implied all over the place.
One great way to tell is to look at your indexes. If certain groups of columns are always indexed together, or if there are lots of missing index requests for certain groups of columns, it may be time to look at splitting them out into different tables.
Second Sign Of Problems: Numbered Columns
Do you allow people multiple, optional values?
The problems you’ll run into here will be searching across all of those.
You’ll end up with queries like this
SELECT * FROM dbo.Yuck WHERE Tag1 = '...' OR Tag2 = '...' OR Tag3 = '...' OR Tag4 = '...' OR Tag5 = '...'
Which can throw the optimizer a hard curve ball, and make indexing awkward.
This should also most likely be broken out into a table of its own that tracks the Post Id and Tag Id, along with a table that tracks the Ids of each Tag.
A wider index across a narrower table is typically less troublesome.
Third Sign Of Problems: Lists In Columns
This should be obvious, and has a similar solution to the problem up there.
Your queries will end up doing something like this:
SELECT 'stuff' FROM dbo.Posts AS p WHERE p.Tags LIKE '%...%'
Which can’t be indexed terribly well, even if you go our of your mind with trigrams.
I See Tables Within Tables
If you have tables with these patterns, it’s time to take a really close look at them.
If you’re totally lost on this, Check out Louis Davidson’s book on relational design.
Stuff like this is easy to sketch out, but often difficult to get fixed. It requires application changes, moving lots of data, and probably dropping indexes.
It’s totally worth it when you get it done though, because it makes your tables far easier to index and manage.
You’ll need far fewer insanely wide indexes to compensate for bad design, and you’ll have way less head scratcher missing indexe requests to sort through.
Thanks for reading!
Free online training next Friday - register now for GroupBy September.