Spiel des Jahres 2021 nominations might still be a couple of months away, but I thought now is still a good time to return to one of the harder questions in my predictions post from last year: What exactly makes a game a Kennerspiel? By that I don’t mean the qualities that earn a game the award, but the distinction between the more casual Spiel and the more complex Kennerspiel categories.
This question was particularly pertinent in 2020 when four out of the six nominees for the two adult awards straddled the line between Spiel and Kennerspiel: My City, Nova Luna, Cartographers, and The Crew could all have landed in either category. Jury member Udo Bartsch wrote a very interesting essay about the very topic of this article, giving some insight into the reasoning behind the jury’s decision. Basically, the distinction isn’t so much about complexity or depth of play, but approachability. How many people can take the hurdles that are in the way before playing a particular game? According to Mr Bartsch, this question doesn’t have a simple answer:
Which game is a game for everyone and which is not is unfortunately not recognised by generally applicable, precisely measurable characteristics, but only when playing with as many different people as possible.
Challenge accepted! Who needs humans when we can just deal with data instead? 🤓
Let’s look at the data
Since the introduction of Kennerspiel des Jahres in 2011 there were a total of 90 games on the longlist for Spiel des Jahres and 64 games for Kennerspiel1. It’s not a large amount of data to make any inferences on, but we’ll try anyways.
The first step is always to familiarise yourself with the task and data at hand. With BoardGameGeek collecting all sorts of data about games, we know features of those jury recommendations like complexity, player age and count, play time, mechanics, themes, …
Why don’t we start simple and plot the games by their complexity (also known as weight) and minimum age?
You see the jury’s favourites of the past decade lining up from simple (left) to complex (right), and from child friendly (bottom) to more mature (top). Unsurprisingly, the red Spiel des Jahres recommendations generally cluster in the bottom left, while the anthracite Kennerspiel games tend towards the top right. The dotted line is the one that best2 separates Spiel from Kennerspiel.
However, there is some significant overlap. In particular, a lot of games of either award can be found around the 10 year / complexity 2 (medium light) intersect. I’ve marked games with squares that fall on the “wrong” side of the line. Some notable outliers are:
|The Quacks of Quedlinburg||2018||1.9||10+||K||❌|
|That’s Pretty Clever!||2018||1.9||8+||K||❌|
So by all means, 2020 did contain a lot of games just on the border of the two awards.
Generally, this works pretty well for such a simple model (a linear function in two variables is first semester kind of stuff). But some games seem to really push far into the other side, e.g., That’s Pretty Clever! and Kitchen Rush. Are there some other characteristics of those games that explain the jury’s classification?
Can we do better?
Complexity and minimum age make a pretty powerful pair, but the only reason I picked two features is because we can nicely visualise everything in 2D. I don’t know about you, but my brain can only handle three dimensions – on a good day…
Mathematics to the rescue! Higher dimensions pose no challenge to our old friend, and we can throw as many variables at it as we want. So let’s add some more features to our model:3
- complexity (weight between 1 and 5),
- minimum age (between 6 and 16 years),
- minimum and maximum play time (between 1 minute 🏃 and 3 hours),
- player count (between 1 and 100 👀 players),
- cooperative or competitive game,
- types (e.g., family, strategy, or party game),
- categories (e.g., card, economic, or medieval game), and
- mechanics (e.g., hand management, set collection, or worker placement).
Using the same set of games, but incorporating all those values, we can go through the same process that produced the separating line in the plot above (multivariate logistic regression, in case you’re curious). This time, that dividing line would rather be a hyperplane in high dimensional space, but don’t worry about that. In fact, we can do better that just a yes/no classification: We can estimate our confidence that a certain game is in fact a Kennerspiel.
This model classifies a whooping 150 out of 154 games correctly as either Spiel or Kennerspiel – that’s 97.4% accurate. 🤯 So much for not being measurable, Mr Bartsch!
So, let’s take a look back at our problem games from before and check how much confidence our model has that the respective game is for connoisseurs:
|The Quacks of Quedlinburg||2018||K||65.3%||✅|
|That’s Pretty Clever!||2018||K||51.8%||✅|
|The Crew: The Quest for Planet Nine||2020||K||41.7%||😕|
You can find the full results list here.
This picture certainly has improved, and we’re even classifying games like That’s Pretty Clever! (just about) and Kitchen Rush right that caused us a lot of headaches before. However, The Crew still eludes correct classification, and Libertalia is so far off that I’d argue the jury simply got that one wrong…
But how does the model work?
Our model takes the different features of a game as described above as input, multiplies each with a certain weight it learned by looking at all the games from previous year, sums those values up, and then yields a prediction in the form of a confidence (0–100%) score.
To make things a little more concret, let’s look at the ten most important features and how they impact the outcome:
You can find the full plot of all features here.
As you can see, the complexity is the most important feature: the higher the value (blue = 1, red = 5), the higher our confidence that the game in question is a Kennerspiel. This makes a lot of sense. Likewise, the next features are equally intuitive: the higher play time and minimum age are, the more likely it is we’re dealing with a games for experts. And of course, in a way strategy games are much more frequently found in the Kennerspiel column. (Here, red means strategy game, blue means not.)
Somewhat more confusing is the player count though. Pay close attention to whether a game is playable with five or six players. Again, red means that game is playable with that head count, while blue means it is not. So a game that is playable with five players is more likely to be a Kennerspiel, while a game for six players is less likely. This certainly is a little confusing, and might well be an artifact of our small sample size. Still, there’s a system to this madness: whilst the vast majority of games accomodate three and four players, Spiel candidates often either stop at that count to keep components and costs down, or are for a much larger audience anyways and really shine with six, eight, or even more players. A Kennerspiel on the other hand can be a little more luxurious, and hence often includes components for a fifth player by default, but their more strategic nature limits the scalability beyond that point.
Lastly, we have some mechanics in the top 10. While it’s probably no surprise that solo modes are more common amongst more strategic games, the other mechanics are less intuitive. As it turns out, rolling dice is more prevalent in a Kennerspiel, whilst hand management and worker placement are indicative of a lighter Spiel. But of course, all those different features and weights interact with each other in a little more subtle ways and often balance each other out.
Let’s make things more concret and visualise how our model scored some of those difficult to classify games above, starting from My City:
This so called force plot shows how some values push the score up, while some pull it down. In this case, complexity, strategy, and player count push the score towards Kennerspiel, while a tile laying game playable in 30 minutes pull it towards Spiel. In the sum, our model and the jury agree: this game falls on the red side of the line.
We contrast this with That’s Pretty Clever! Again, its play time of 30 minutes with players from 8 years old smell like a lighter game, but soloable dice rolling clearly push the needle (very narrowly) over the line of a Kennerspiel.
In the end, our evidence is a little thin – even after a decade of Kennerspiel winners and nominees, the patterns aren’t very clear, and of course the jury’s fickle opinion might drift over time as well. So while I’m pretty confident in the predictions our model makes, they still need to be taken with a grain of salt.
“Not so fast!”, you might say. “Aren’t you simply overfitting here?” Why, yes, you’re right. The dataset is so small that there’s a high risk of fine tuning the model too much for the data we’re seeing. And of course, it’s bad bad bad to assess your model’s performance with items it was trained on – that’s just cheating. So let’s test the model on some games it hadn’t seen yet!
What about old games?
Pre-2011 Spiel des Jahres winners and nominees make a marvellous test set for this model. A lot of those games would be considered a Kennerspiel by today’s standards, so let’s find out which ones.
Again, we’ll start with the simple model that takes the two input variables complexity and minimum age. We can then plot those 70 games and check what side of the line they fall on:
We observe a pretty similar spread along those two axes as in the plot above, so apparently the jury covered games of a broad variety already before 2011, but under the single Spiel des Jahres brand.
Let’s dive deeper and check what Kennerspiel scores our more complex model assigns to some of the more noteworthy Spiel des Jahres winners and nominees:
|Sherlock Holmes Consulting Detective||1985||99.9%|
|Thurn and Taxis||2006||77.0%|
|Ticket to Ride||2004||12.0%|
You can find the full results list here.
On the one hand, it’s weird to see games like Catan and Pandemic so firmly in the Kennerspiel column when they are considered some of the quintessential modern gateway games. On the other hand, their complexity clearly does exceed by far what the jury demands of the average gamer these days. It’s also worth observing that Catan did pave the way for some pretty complex games in the second half of the 90s, when the euro revolution was in full swing.
As far as validating the model goes: I’d agree with every single one of the model’s assessments, though I’m a little surprised that Citadels got a score of 95.3%. I see good reasons for putting this one into the Kennerspiel camp, but would do so with far more uncertainty.
Overall, according to our model, 9 out of 32 Spiel des Jahres winners between 1979 and 2010 should really be considered a Kennerspiel now. I wonder how many people trusted the red meeple, bought what they thought to be a welcoming game, only to get frustrated by 12 densely filled A4 pages4 of El Grande rules? Or did people really have much longer attention spans in the pre-smartphone era? We might never know…
(Kenner-)Spiel des Jahres 2021
I’ll send you off with a teaser for Spiel des Jahres 2021. I’ve taken some of the hottest contenders for the 2021 awards (as of the time of writing), and sort them by their Kennerspiel score for your convenience.
|Gloomhaven: Jaws of the Lion||100.0%|
|Rajas of the Ganges: The Dice Charmers||80.5%|
|New York Zoo||63.0%|
|The Castles of Tuscany||39.3%|
|MicroMacro: Crime City||1.4%|
|The Key: Raub in der Cliffrock Villa||0.9%|
I think this makes a pretty interesting early list of six candidates for Spiel des Jahres 2021 and four candidates for Kennerspiel des Jahres 2021, don’t you think? Stay tuned!
In 2011, there was no separate recommendation list for the two awards, so I only included the nominees for 2011. I also added the special award winners Caylus, Agricola, World Without End, and Pandemic Legacy: Season 2 to the Kennerspiel list. ↩︎
Using logistic regression with F1–score as target metric. Other definitions of “best line” of course might yield different results. ↩︎
It’s worth noticing that some of those values can be unreliable. Complexity and game type depend on user votes which often only a handful of contributors. Even more so, categories and mechanics are quite wonky taxonomies which are frequently applied inconsistently. Finally, player count and age as well as play time are taken from the publishers who do not hesitate to lie about these things if it helps sell their games. ↩︎
Yes, I did pull out my old copy and counted. You’re welcome. 🤓 ↩︎