Quantcast
Channel: Erik Darling – Brent Ozar Unlimited®
Viewing all articles
Browse latest Browse all 370

SUM, AVG, and arithmetic overflow

$
0
0

You Shoulda Brought A Bigger Int

Sometimes you run a query, and everything goes fine.

For a while. For example, if I run this query in the 2010 copy of Stack Overflow, it finishes pretty quickly, and without error.

SELECT u.Id, 
       u.DisplayName, 
	   SUM(p.Score) AS SumPostScore,
	   AVG(c.Score) AS SumCommentScore
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON u.Id = p.OwnerUserId
JOIN dbo.Comments AS c
    ON u.Id = c.UserId
WHERE u.Reputation >= 10000
AND   p.PostTypeId = 2
AND   p.Score >= 10
AND   c.Score >= 1
GROUP BY u.Id, 
         u.DisplayName

If I run this query in the full version, it runs for a minute and a half, and then returns an error:

Msg 8115, Level 16, State 2, Line 4
Arithmetic overflow error converting expression to data type int.

They Shoot Horses, Don’t They?

Now, we’ve had COUNT_BIG for ages. Heck, we even need to use it in indexed views when they perform an aggregation. More recently, we got DATEDIFF_BIG. These functions allow for larger integers to be handled without additional math that I hate.

If one day you got an email that this query suddenly started failing, and you had to track down the error message, what would you do?

I’d probably wish I wore my dark pants and start running DBCC CHECKDB.

I kid, I kid.

There are no pants here.

Anyway, it’s easy enough to fix.

SELECT u.Id, 
       u.DisplayName, 
	   SUM(CONVERT(BIGINT, p.Score)) AS SumPostScore,
	   AVG(CONVERT(BIGINT, c.Score)) AS SumCommentScore
FROM dbo.Users AS u
JOIN dbo.Posts AS p
    ON u.Id = p.OwnerUserId
JOIN dbo.Comments AS c
    ON u.Id = c.UserId
WHERE u.Reputation >= 10000
AND   p.PostTypeId = 2
AND   p.Score >= 10
AND   c.Score >= 1
GROUP BY u.Id, 
         u.DisplayName

You just have to lob a convert to bigint in there.

Data Grows, Query Breaks

This is a pretty terrible situation to have to deal with, especially if you have a lot of queries that perform aggregations like this.

Tracking them all down, and adding in fixes is arduous, and there’s not a solution out there that doesn’t require changing code.

I was going to open a User Voice item about this, but I wanted to get reader feedback, first, because there are a few different ways to address this that I can think of quickly.

  1. Add _BIG functions for SUM, AVG, etc.
  2. Add cultures similar to CONVERT to indicate int, bigint, or decimal
  3. Change the functions to automatically convert when necessary
  4. Make the default output a BIGINT

There are arguments for and against all of these, but leave a comment with which you think would be best, or your own.

Thanks for reading!


Viewing all articles
Browse latest Browse all 370

Trending Articles