Reverse engineering the BoardGameGeek ranking – Part 2!

This is the second part of a series explaining and analysing the BoardGameGeek rankings. Read the first part here.

Last time I left you with the nice result that BoardGameGeek (BGG) calculates its ranking by taking users' ratings for a particular game and then add around 1500-1600 dummy ratings of 5.5. This so-called geek score is used to sort the games from best (Gloomhaven) to worst (Tic-Tac-Toe).

One detail however we touched on in passing, but did not resolve, is how that number of dummy ratings develop over time. When the current calculation method was introduced, BGG founder Scott Alden mentioned that this number would be pegged to the number of total ratings, but did not reveal any details. Challenge accepted!

In order to tackle this question, we need to compare that dummy number to the total number of ratings over time. Fortunately, thanks to the scraping done for Recommend.Games, we have access to the BGG games data over the past year or so. Using these snapshots, we observe how the number of games and ratings in the database has grown:

Number of games and ratings on BGG over time

We can now repeat the exact same calculation we did in the previous post: For each point in time, the algorithm searches for the number of dummy ratings that yields an estimated geek score closest to the actual score. Now, we have a bunch of data points that correlate the total number of ratings with the number of dummies used at that time. Here’s what it looks like:

Number of total ratings vs dummy ratings

We get a pretty nice straight line – the dashed line in the plot is fitted with linear regression, i.e., the straight line that most closely fits our data. Its formula is:

\[ \textrm{number of dummies} \approx 0.0000997 \cdot \textrm{total number of ratings}. \]

This means that for every rating entered into the BGG database, the number of dummy ratings is increased by \(0.0000997\). That number might look a bit opaque, but it’s actually very easy to interpret once you put the question to its head: How many ratings have to be entered for the number of dummies to increase by \(1\)? You get the answer to that by taking the inverse of that factor, which happens to be about \(10\,032\). This number is way to close to \(10\,000\) to be a coincidence! We can conclude the exact formula for the number of dummy ratings:

\[ \textrm{number of dummies} = \frac{\textrm{total number of ratings}}{10\,000}. \]

As of the time of writing, there are \(17\,287\,904\) ratings (give or take) in the BGG database, so there will be around 1729 dummy ratings of \(5.5\) added to the regular ratings.

As the number of BGG users rises steadily, this number of dummy ratings also keeps increasing. This is part of the reason why older games (particularly those with a newer edition) tend to drop in the rankings. When users stop adding new ratings, a game’s average rating more or less freezes. But because more and more dummy votes are added, the geek score decreases every time it gets recalculated, and so the older games drop in the rankings, while the latest hotness gets all the fresh votes, and shoots up to the top.

The circle of hype.

See also

comments powered by Disqus