<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Machine Learning on Analysis Paralysis</title><link>https://blog.recommend.games/tags/machine-learning/</link><description>Recent content in Machine Learning on Analysis Paralysis</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>contact@recommend.games (Recommend.Games)</managingEditor><webMaster>contact@recommend.games (Recommend.Games)</webMaster><lastBuildDate>Fri, 12 Dec 2025 12:00:00 +0200</lastBuildDate><atom:link href="https://blog.recommend.games/tags/machine-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>Teaching Elo to Play with Friends</title><link>https://blog.recommend.games/posts/teaching-elo-to-play-with-friends/</link><pubDate>Fri, 12 Dec 2025 12:00:00 +0200</pubDate><author>contact@recommend.games (Recommend.Games)</author><guid>https://blog.recommend.games/posts/teaching-elo-to-play-with-friends/</guid><description>&lt;p&gt;At some point this year, I let my laptop run flat-out for almost two weeks just to answer one question: &lt;em&gt;how much of a four-player board game is &amp;ldquo;skill&amp;rdquo; and how much is &amp;ldquo;luck&amp;rdquo;?&lt;/em&gt; That sounds excessive, but there was a catch: before I could even start those simulations, I had to fix a basic problem. Elo – the rating system we&amp;rsquo;ve been happily using so far – only really knows how to handle one-on-one duels.&lt;/p&gt;
&lt;p&gt;This article is the missing technical chapter in the series. In &lt;a href="https://blog.recommend.games/posts/elo-ratings-explained/"&gt;part 1&lt;/a&gt; we met Elo and learned how it turns match results into skill ratings. In &lt;a href="https://blog.recommend.games/posts/world-snooker-champion-2025/"&gt;part 2&lt;/a&gt; we sent those ratings to the Crucible to predict the next World Snooker Champion. And in &lt;a href="https://blog.recommend.games/posts/elo-as-a-skill-o-meter/"&gt;part 3&lt;/a&gt; we stole a clever idea from Dürsch, Lambrecht and Oechssler to turn the spread of Elo ratings in a two-player &amp;ldquo;toy universe&amp;rdquo; into a kind of skill-o-meter: a way to say whether a game behaves more like a 30%-skill world or an 80%-skill world.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s one obvious gap left: most modern board games aren&amp;rsquo;t tidy head-to-head affairs. Around a real table you&amp;rsquo;ll usually find three, four, sometimes five players battling it out in CATAN, Brass, Gaia Project or whatever your current obsession is. If we want to use our shiny skill-o-meter on those games, we first have to teach Elo how to cope with real multiplayer tables instead of just faking them as a stack of two-player matches.&lt;/p&gt;
&lt;p&gt;Fair warning: this part is even more technical than part 3. We&amp;rsquo;ll talk about probability matrices, permutations and a scary-looking formula or two. If that&amp;rsquo;s not your thing, you&amp;rsquo;re still very welcome to skim the maths-heavy bits – I&amp;rsquo;ll keep pointing out the important intuitions along the way. The payoff is worth it: by the end of this article, we&amp;rsquo;ll have a principled multiplayer Elo system and a checked-and-calibrated skill-o-meter that still works when three, four or five people sit down to play.&lt;/p&gt;
&lt;h2 id="why-two-player-elo-isnt-enough-for-modern-games"&gt;Why two-player Elo isn&amp;rsquo;t enough for modern games&lt;/h2&gt;
&lt;p&gt;Elo&amp;rsquo;s original paper was targeted at chess, so naturally it was only concerned with two-player games. Likewise, everything I&amp;rsquo;ve talked about in this series so far has assumed a simple head-to-head match: one player vs another, winner takes the Elo chips.&lt;/p&gt;
&lt;p&gt;If we want to apply our shiny &amp;ldquo;skill-o-meter&amp;rdquo; from part 3 to the games we actually play, we need to teach Elo how to handle true multiplayer tables instead of just faking them as a bunch of two-player matches.&lt;/p&gt;
&lt;h2 id="how-people-fake-multiplayer-elo-and-why-its-not-quite-right"&gt;How people fake multiplayer Elo (and why it&amp;rsquo;s not quite right)&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re like me and spend a slightly embarrassing amount of your free time on &lt;a href="https://boardgamearena.com/"&gt;Board Game Arena&lt;/a&gt;, you might have noticed their Elo implementation. They simply treat multiplayer games as a collection of 1‑vs‑1 battles. So if Alice, Bob and Carol play a game, their Elo calculations treat this as &lt;em&gt;three&lt;/em&gt; matches: Alice vs Bob, Alice vs Carol and Bob vs Carol. If Alice indeed won the game, Bob came in second and Carol last, Alice would win both her &amp;ldquo;virtual&amp;rdquo; matches and Bob his against Carol. Elo ratings would then be updated according to the regular formula, with \(K\) &amp;ldquo;adjusted for player count&amp;rdquo; (I didn&amp;rsquo;t find an up-to-date source as to the details).&lt;/p&gt;
&lt;p&gt;Conceptually, this is a neat hack but not quite right: it pretends Alice actually played two independent duels against Bob and Carol, even though in reality all three interacted in the same shared game state and their decisions affected each other at the same time.&lt;/p&gt;
&lt;p&gt;Note that for an \(n\)-player game there are \({n \choose 2} = \frac{n(n-1)}{2}\) pairings, so the number of updates grows quadratically with player count. This kind of growing complexity can really come back to bite you in the behind when it comes to compute, but (a) luckily we don&amp;rsquo;t need to worry about matches with hundreds of players in tabletop gaming and (b) it could be &lt;em&gt;much&lt;/em&gt; worse, as we shall see in a minute…&lt;/p&gt;
&lt;h2 id="a-more-principled-multiplayer-elo-ranking-probabilities"&gt;A more principled multiplayer Elo: ranking probabilities&lt;/h2&gt;
&lt;p&gt;In &lt;a href="https://blog.recommend.games/posts/elo-as-a-skill-o-meter/"&gt;part 3&lt;/a&gt;, we already leaned on a neat idea by Peter Dürsch, Marco Lambrecht and Jörg Oechssler from their paper &amp;ldquo;&lt;a href="https://doi.org/10.1016/j.euroecorev.2020.103472"&gt;Measuring skill and chance in games&lt;/a&gt;&amp;rdquo; (2020). There we used their framework to turn the spread of Elo ratings into a &amp;ldquo;skill-o-meter&amp;rdquo; for two-player games. In this article, we&amp;rsquo;re going back to the same well: DLO also propose a way to run Elo on proper multiplayer tables, and that&amp;rsquo;s exactly the tool we need for modern board games.&lt;/p&gt;
&lt;h3 id="from-table-results-to-expected-payoffs"&gt;From table results to expected payoffs&lt;/h3&gt;
&lt;p&gt;Dürsch et al suggest a more principled way to deal with multiplayer tables. Let \(n\) be the number of players in the match. Instead of pretending everyone played everyone else in separate duels, they directly model the whole finishing order at once.&lt;/p&gt;
&lt;p&gt;The first ingredient is an \(n\times n\) matrix of probabilities:&lt;/p&gt;
&lt;p&gt;\[
p_{ij} = P(\text{player $i$ finishes in position $j$}).
\]&lt;/p&gt;
&lt;p&gt;You can read row \(i\) as &amp;ldquo;what&amp;rsquo;s the chance player \(i\) finishes 1st, 2nd, …, last?&amp;rdquo; and column \(j\) as &amp;ldquo;who is most likely to end up in position \(j\)?&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Just like in the two-player case, we need a numerical payoff to compare expectations with reality. For an \(n\)-player game we give the winner \(n-1\) points, the runner-up \(n-2\), all the way down to 0 for last place.&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; If there are ties, we give each tied player the average of the payoffs they straddle. That gives us the expected payoff for player \(i\):&lt;/p&gt;
&lt;p&gt;\[
e_i = E[\text{payoff for player $i$}] = \sum_{j=0}^{n-1} p_{ij} (n - 1 - j).
\]&lt;/p&gt;
&lt;p&gt;Once we have that, the Elo update looks exactly like before. Let \(a_i\) be the actual payoff (from the final ranking, scaled in the same way). We compare \(a_i\) to \(e_i\), and shift the rating in the direction of the surprise:&lt;/p&gt;
&lt;p&gt;\[
r_i \leftarrow r_i + \frac{K}{n-1} (a_i - e_i).
\]&lt;/p&gt;
&lt;p&gt;The factor \(1/(n-1)\) just normalises things so that one whole game still corresponds to about \(K\) &amp;ldquo;chips&amp;rdquo; moving around, as in the two-player version.&lt;/p&gt;
&lt;h3 id="from-elo-ratings-to-ranking-probabilities"&gt;From Elo ratings to ranking probabilities&lt;/h3&gt;
&lt;p&gt;This is where things get a bit heavy. If you&amp;rsquo;re mostly here for the big picture, feel free to skim or even skip the formulae in this section — I&amp;rsquo;ll summarise the important part again at the end.&lt;/p&gt;
&lt;p&gt;Conceptually, what we want is simple: for a given set of Elo ratings, we assign a probability to each possible finishing order of the players. Stronger players should be more likely to end up near the top, weaker ones near the bottom. Once we have those probabilities, we can add them up to get the chance that a particular player finishes in a particular position.&lt;/p&gt;
&lt;p&gt;Formally, we write a possible ranking as a permutation \(\tau\) of \({0, \dots, n-1}\), where \(\tau(j)\) tells us &lt;em&gt;which player&lt;/em&gt; ends up in position \(j\) (with \(j=0\) for the winner, \(j=1\) for second place, and so on). The probability of seeing a particular ranking \(\tau\) can be written using the &lt;a href="https://en.wikipedia.org/wiki/Chain_rule_(probability)"&gt;chain rule of probability&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;\[
P(\tau) = P(\text{players $\tau(0), \dots, \tau(n - 1)$ on positions $0, \dots, n - 1$}) \\
= \prod_{j=0}^{n-1} P(\text{player $\tau(j)$ on position $j$} \mid \text{players $\tau(0), \dots, \tau(j - 1)$ fixed above}).
\]&lt;/p&gt;
&lt;p&gt;To estimate those conditional probabilities, Dürsch et al use the &lt;a href="https://en.wikipedia.org/wiki/Softmax_function"&gt;softmax&lt;/a&gt; over Elo ratings. Softmax is just the multiplayer cousin of the Elo win-probability formula: you take a &amp;ldquo;strength score&amp;rdquo; for each player, exponentiate it, and then divide by the sum so that everything adds up to 1. At each step \(j\), we look at the players who haven&amp;rsquo;t been placed yet and assign probabilities proportional to \(10^{r / 400}\), just like in the two-player Elo formula. If we write \(r_i\) for the current rating of player \(i\), this gives:&lt;/p&gt;
&lt;p&gt;\[
P(\text{player $\tau(j)$ on position $j$} \mid \text{players $\tau(0), \dots, \tau(j - 1)$ fixed above}) \\
= \frac{10^{r_{\tau(j)} / 400}}{\sum_{k=j}^{n-1} 10^{r_{\tau(k)} / 400}}.
\]&lt;/p&gt;
&lt;p&gt;Plugging this into the chain rule expression yields a compact formula for the probability of a full ranking \(\tau\):&lt;/p&gt;
&lt;p&gt;\[
P(\tau) = \prod_{j=0}^{n-1} \frac{10^{r_{\tau(j)} / 400}}{\sum_{k=j}^{n-1} 10^{r_{\tau(k)} / 400}}.
\]&lt;/p&gt;
&lt;p&gt;Now, to get the entries of our probability matrix, we just have to sum over all rankings that put a given player in a given position. Remember that \(p_{ij}\) is the probability that player \(i\) finishes in position \(j\). With the convention \(\tau(j) = i\) meaning &amp;ldquo;player \(i\) sits in position \(j\)&amp;rdquo;, we have:&lt;/p&gt;
&lt;p&gt;\[
p_{ij} = \sum_{\tau \text{ with } \tau(j) = i} P(\tau).
\]&lt;/p&gt;
&lt;p&gt;If the formulae lost you at some point, that&amp;rsquo;s OK — the story is simple: we assign a probability to each possible finishing order based on the Elo ratings, and then sum those probabilities to find out how likely each player is to end up in each position. That&amp;rsquo;s all you really need to remember from this section.&lt;/p&gt;
&lt;h3 id="does-this-really-generalise-two-player-elo"&gt;Does this really generalise two-player Elo?&lt;/h3&gt;
&lt;p&gt;You might still wonder if it&amp;rsquo;s really justified to call this a generalisation of two-player Elo, since it looks rather different at first glance. The crucial sanity check is that when we only have \(n = 2\) players at the table, all of this machinery collapses back to the usual head-to-head model: there are only two possible rankings, the probability matrix reduces to the familiar win–loss probabilities, the payoff vector \((1, 0)\) just scores win vs loss, and the update rule becomes exactly the original Elo formula again.&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt; You don&amp;rsquo;t need to wade through the algebra – the important point is that for ordinary two-player encounters, this system behaves just like classic Elo.&lt;/p&gt;
&lt;h3 id="the-price-of-doing-it-properly-combinatorics-and-compute"&gt;The price of doing it properly: combinatorics and compute&lt;/h3&gt;
&lt;p&gt;There is one big catch we&amp;rsquo;ve glossed over so far. To calculate the entries of the probability matrix \(p_{ij}\), we have to sum \(P(\tau)\) over all possible rankings \(\tau\). If you remember your combinatorics basics, you&amp;rsquo;ll know that there are \(n!\) permutations of \(n\) players – a function that grows even faster than exponential. In other words: a straightforward implementation of this model is computationally very expensive.&lt;/p&gt;
&lt;p&gt;Does this mean the whole approach is doomed? Luckily, not quite. Most board games have at most five or six players, and \(6! = 720\) is big but still perfectly manageable on a modern computer. That covers the vast majority of situations we care about in tabletop gaming.&lt;/p&gt;
&lt;p&gt;For higher player counts there are more efficient tricks (for example dynamic programming and Monte Carlo approximations) that avoid looping over all permutations explicitly. I&amp;rsquo;m not going to go into the details here; if you&amp;rsquo;re curious, you can have a look at the implementation in the code for this article – but for our purposes it&amp;rsquo;s enough to know that the full model is tractable for realistic games.&lt;/p&gt;
&lt;h3 id="multiplayer-p-deterministic-games"&gt;Multiplayer p-deterministic games&lt;/h3&gt;
&lt;p&gt;Right, after so much theory you deserve something a bit more concrete. Real-world applications will come in the next article; for now, there&amp;rsquo;s still one more thing to check: do the multiplayer versions of the \(p\)-deterministic game behave in the same way as the two-player toy world we built in part 3?&lt;/p&gt;
&lt;p&gt;The setup remains almost the same. We fix an underlying skill ranking for all players. For each game, we flip a weighted coin: with probability \(p\) we play a game of pure skill, where players finish in order of their underlying strength; with probability \(1-p\) we play a game of pure chance, where the finishing order is just a random permutation of the players. It&amp;rsquo;s the same toy universe as before, just with more than two players sitting at the table each time.&lt;/p&gt;
&lt;h3 id="the-σ-vs-p-benchmark-still-holds-for-up-to-15-players"&gt;The σ vs p benchmark still holds for up to 15 players&lt;/h3&gt;
&lt;p&gt;With this multiplayer version of the \(p\)-deterministic game in hand, we can run the same kind of simulations as before. For each choice of \(p\) and for player counts between 2 and 15, we let lots of games play out, calibrate \(K\) on the simulated match data, compute the resulting Elo ratings and record their standard deviation \(\sigma\). Plotting \(\sigma\) against \(p\) for each player count gives us this family of curves:&lt;/p&gt;

&lt;img
	src="https://blog.recommend.games/posts/teaching-elo-to-play-with-friends/p_deterministic_vs_sigma.png"
	alt="p_deterministic vs σ for various player counts"
	 /&gt;

&lt;p&gt;All of these curves are smooth and strictly increasing: as we turn up \(p\) and let skill matter more often, the Elo spread \(\sigma\) grows, just like in the two-player case. More interestingly, when we plot these player counts from 2 up to 15, the points for different player counts are essentially indistinguishable: for each value of \(p\), all the coloured dots sit almost exactly on top of each other. Any tiny visible wobble at very high \(p\) is well within the limits of simulation noise and numerical quirks.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s precisely the behaviour we were hoping to see. Empirically, in this toy universe \(\sigma\) is effectively a function of \(p\) alone and — within our numerical precision — invariant to how many players sit at the table, even up to 15. In practical terms, this means that if we measure a standard deviation \(\sigma\) in a real three-, four- or five-player game, we can safely read off a corresponding &amp;ldquo;\(p\)-skill world&amp;rdquo; from this benchmark without worrying about the exact player count.&lt;/p&gt;
&lt;p&gt;Talking of the computational effort: getting this last plot alone down to &lt;em&gt;only&lt;/em&gt; about two weeks of wall-clock time on my poor laptop took a fair bit of optimisation. The result might look a little underwhelming after all that build-up, but that&amp;rsquo;s exactly the point: after grinding through all those simulations, the curves stubbornly agree that player count basically doesn&amp;rsquo;t matter. 🔥😅🤓&lt;/p&gt;
&lt;h2 id="where-this-leaves-us-and-whats-next"&gt;Where this leaves us (and what&amp;rsquo;s next)&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;ve covered a lot of ground in this article, but the payoff is twofold.&lt;/p&gt;
&lt;p&gt;First, we now have a principled way to run Elo on real multiplayer tables. Instead of faking CATAN or Brass as a pile of head-to-head duels, we can model the whole finishing order at once, get sensible expected payoffs for each seat, and update ratings in a way that reduces to classic two-player Elo when there are only two people at the table.&lt;/p&gt;
&lt;p&gt;Second, we&amp;rsquo;ve stress-tested our &amp;ldquo;Elo-as-a-skill-o-meter&amp;rdquo; from part 3 in a richer toy universe. In those \(p\)-deterministic worlds, the standard deviation \(\sigma\) of Elo ratings turns out to depend almost entirely on \(p\) and, within numerical accuracy, not on how many players sit down to play. That means \(\sigma\) really does behave like a calibrated skill dial we can use for 2–6 player games.&lt;/p&gt;
&lt;p&gt;Put together, this gives us the toolset we wanted: given real multiplayer game logs, we can (a) fit Elo using the multiplayer update, (b) calibrate \(K\) on predictive accuracy, (c) read off the resulting \(\sigma\) and map it to a &amp;ldquo;skill fraction&amp;rdquo; \(p\) using our benchmark curve.&lt;/p&gt;
&lt;p&gt;Next time, we&amp;rsquo;ll finally unleash this machinery on actual board games. We&amp;rsquo;ll look at real play logs, see which games behave more like 30%-skill worlds and which ones look closer to 80% skill, and maybe settle a few pub arguments along the way. 🤓&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;Dürsch et al use a flexible payoff structure which makes the formulae and implementation more confusing. For our purposes, the fixed payoff based on ranks is enough, so I tried to keep things simple.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;If you&amp;rsquo;re itching to do the algebra yourself, be my guest — that&amp;rsquo;s the unofficial &amp;ldquo;exercise to the reader&amp;rdquo; for this section. I decided you didn&amp;rsquo;t need to watch me juggle minus signs for a page.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>Elo as a Skill-O-Meter</title><link>https://blog.recommend.games/posts/elo-as-a-skill-o-meter/</link><pubDate>Fri, 05 Dec 2025 12:00:00 +0200</pubDate><author>contact@recommend.games (Recommend.Games)</author><guid>https://blog.recommend.games/posts/elo-as-a-skill-o-meter/</guid><description>&lt;p&gt;Whether a game counts as &amp;ldquo;skill&amp;rdquo; or &amp;ldquo;chance&amp;rdquo; isn&amp;rsquo;t just a pub argument — in many countries it&amp;rsquo;s a legal distinction. Roulette and blackjack live on the &amp;ldquo;chance&amp;rdquo; side; tennis and chess are filed under &amp;ldquo;skill&amp;rdquo;. Different rules, different taxes, different ways for people to lose money.&lt;/p&gt;
&lt;p&gt;The trouble is that this line is usually drawn by tradition and gut feeling. Is poker really &amp;ldquo;more skill&amp;rdquo; than backgammon? Is snooker closer to roulette or closer to chess? A group of economists tried to answer that question more systematically: instead of arguing, measure how &amp;ldquo;skill-heavy&amp;rdquo; a game is in practice by looking at the Elo ratings of all its players. We&amp;rsquo;ll meet their work properly in a bit.&lt;/p&gt;
&lt;p&gt;In this article I want to steal that idea for board games. So far we&amp;rsquo;ve used Elo to track individual player strength; this time we&amp;rsquo;ll go one level up. Instead of asking &lt;em&gt;who&lt;/em&gt; is strong, we&amp;rsquo;ll look at the whole &lt;em&gt;distribution&lt;/em&gt; of Elo ratings in a game and see what its spread can tell us about luck and skill — turning Elo into a kind of &amp;ldquo;skill-o-meter&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="from-elo-ratings-to-skill-distributions"&gt;From Elo ratings to skill distributions&lt;/h2&gt;
&lt;p&gt;By now the basics of Elo should be familiar: each player gets a rating that reflects their playing strength, rating differences go into a simple formula to give expected win probabilities, and after each match we update those ratings based on whether players beat expectations. If you want the full story (including all the maths and the logistic regression detour), &lt;a href="https://blog.recommend.games/posts/elo-ratings-explained/"&gt;part 1&lt;/a&gt; has you covered; here we&amp;rsquo;ll treat Elo as a black box that turns match results into reasonable estimates of player skill.&lt;/p&gt;
&lt;p&gt;In &lt;a href="https://blog.recommend.games/posts/world-snooker-champion-2025/"&gt;part 2&lt;/a&gt; we applied this system to predict the 2025 World Snooker Champion. The model&amp;rsquo;s favourite, John Higgins, didn&amp;rsquo;t manage to win his fifth title, but it did give eventual winner Zhao Xintong a 10.6% chance when the bookies only gave him 5.9%. I&amp;rsquo;ll take that as a personal win — and as evidence that Elo isn&amp;rsquo;t just numerology, it really does capture something about players&amp;rsquo; skills.&lt;/p&gt;
&lt;h3 id="wider-distributions-more-skill"&gt;Wider distributions, more skill&lt;/h3&gt;
&lt;p&gt;What we&amp;rsquo;re really after is a way to say &lt;em&gt;how much&lt;/em&gt; skill is involved in a game, and to compare different games with each other. The individual Elo numbers are just a stepping stone. To get anywhere, we have to zoom out and look at the whole &lt;em&gt;distribution&lt;/em&gt; of skills, as measured by all players&amp;rsquo; Elo ratings.&lt;/p&gt;
&lt;p&gt;The basic idea is simple: if a game is mostly luck and players&amp;rsquo; decisions don&amp;rsquo;t matter much, nobody can reliably stay ahead for long. You&amp;rsquo;ll see some winning streaks, but they&amp;rsquo;ll wash out again, and everyone&amp;rsquo;s ratings will cluster around 0. In a game where skill really matters, the strongest players win more often than they lose and slowly drift away from the pack. The result is a much wider Elo distribution: a long tail of very strong players and a long tail of weaker ones.&lt;/p&gt;
&lt;h3 id="a-first-look-snooker-vs-tennis"&gt;A first look: snooker vs tennis&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s make this more concrete. We&amp;rsquo;ve already calculated Elo ratings for snooker, so let&amp;rsquo;s compare it to another English upper-class sport played on a green surface: tennis. Can you tell which one is more skill-based? To get a more objective answer, we look at the Elo distributions for both and see which one is wider:&lt;/p&gt;

&lt;img
	src="https://blog.recommend.games/posts/elo-as-a-skill-o-meter/elo_distribution_snooker_tennis_wta.png"
	alt="The Elo distributions for Snooker and Tennis (WTA)"
	 /&gt;

&lt;p&gt;According to this plot, the Elo ratings of snooker players have a much higher peak and shorter tails, which suggests that outcomes are more influenced by luck. Tennis — at least on the WTA, the women&amp;rsquo;s tour&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt; — seems to show a wider spread and therefore more room for skill. But how do we know these distributions are even comparable? And can we turn that vague &amp;ldquo;more luck, more skill&amp;rdquo; into an actual number?&lt;/p&gt;
&lt;h2 id="turning-spread-into-a-skill-measure"&gt;Turning spread into a skill measure&lt;/h2&gt;
&lt;p&gt;To answer these questions, we need to properly dive into the science. 🧑‍🔬&lt;/p&gt;
&lt;h3 id="the-one-neat-trick-economists-use-to-measure-skill-that-gamers-can-steal"&gt;The one neat trick economists use to measure skill (that gamers can steal)&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;m going to lean on a neat idea by Peter Dürsch, Marco Lambrecht and Jörg Oechssler, from their paper &amp;ldquo;&lt;a href="https://doi.org/10.1016/j.euroecorev.2020.103472"&gt;Measuring skill and chance in games&lt;/a&gt;&amp;rdquo; (2020). They come from an economics background and originally cared about gambling regulation, but the trick itself is much more general: take the Elo ratings for all players in a game, look at their &lt;em&gt;distribution&lt;/em&gt;, and from that pin down a single number that tells you where the game sits on the spectrum between &amp;ldquo;pure chance&amp;rdquo; and &amp;ldquo;pure skill&amp;rdquo;. That&amp;rsquo;s exactly what we&amp;rsquo;re trying to do here — just for board games instead of casinos. 🤑&lt;/p&gt;
&lt;p&gt;So we&amp;rsquo;ll follow their lead and focus on the &lt;em&gt;spread&lt;/em&gt; of the Elo distribution as our measure of how much skill shows up in a game.&lt;/p&gt;
&lt;h3 id="standard-deviation-of-elo-ratings"&gt;Standard deviation of Elo ratings&lt;/h3&gt;
&lt;p&gt;The mathematical measure for the spread of a distribution is its &lt;em&gt;standard deviation&lt;/em&gt; \(\sigma\). The wider the distribution, the larger its standard deviation. Roughly speaking, it&amp;rsquo;s the expected (squared) difference from the mean. In our setting, that means \(\sigma\) tells us how far, on average, players&amp;rsquo; skills lie from the &amp;ldquo;average&amp;rdquo; player: a bigger \(\sigma\) means the field is more spread out, with larger typical gaps between players — exactly the sort of quantity we want to look at.&lt;/p&gt;
&lt;p&gt;So from now on, whenever I talk about the &amp;ldquo;amount of skill&amp;rdquo; we see in a game, I&amp;rsquo;ll use the standard deviation of its Elo ratings, \(\sigma\), as the proxy.&lt;/p&gt;
&lt;h3 id="the-problem-with-k"&gt;The problem with K&lt;/h3&gt;
&lt;p&gt;There&amp;rsquo;s an important caveat: Elo ratings and their distribution crucially depend on \(K\), the update factor. If you remember the metaphor from the first article, \(K\) is the number of &amp;ldquo;skill chips&amp;rdquo; the players put into the pot each game. Higher stakes lead to a wider spread; if almost no chips change hands, everyone ends up with roughly the same number and the spread stays tiny.&lt;/p&gt;
&lt;p&gt;A natural idea would be to just fix one \(K\) for all games. Unfortunately, that doesn&amp;rsquo;t work either. Imagine two ladders for exactly the same game with the same population of players. In ladder &lt;em&gt;A&lt;/em&gt;, everyone plays a handful of games per year; in ladder &lt;em&gt;B&lt;/em&gt;, the same people grind hundreds of games a month. We now run Elo with the same \(K\) on both datasets. In ladder &lt;em&gt;A&lt;/em&gt; only a few &amp;ldquo;skill chips&amp;rdquo; ever change hands before the season is over, ratings barely have time to drift apart, and the final distribution stays fairly tight. In ladder &lt;em&gt;B&lt;/em&gt;, chips slosh back and forth for thousands of rounds; random streaks get amplified, the system has time to separate the strong from the weak, and the rating spread ends up much wider. The underlying game and the underlying skills are identical, yet the dispersion of Elo ratings depends heavily on how often people play and how long we observe them. Fixing \(K\) globally doesn&amp;rsquo;t make the spread an intrinsic property of the game — it just bakes in arbitrary design decisions about volume and time.&lt;/p&gt;
&lt;h3 id="calibrating-k-from-the-data"&gt;Calibrating K from the data&lt;/h3&gt;
&lt;p&gt;What Dürsch et al suggest instead is to calibrate \(K\) from the data, so that the Elo ratings are as good as possible at the job they were designed for: predicting who wins. Remember how Elo works: we take the pre-match ratings \(r_A\) and \(r_B\) of two players &lt;em&gt;A&lt;/em&gt; and &lt;em&gt;B&lt;/em&gt;, and feed the difference into a logistic formula to get the predicted win probability for &lt;em&gt;A&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;\[ p_A = \frac{1}{1 + 10^{-(r_A - r_B) / 400}}. \]&lt;/p&gt;
&lt;p&gt;After the match, we compare this prediction with \(s_A\), the actual outcome of the match (\(s_A = 1\) if &lt;em&gt;A&lt;/em&gt; won, \(s_A = 0\) if they lost and \(s_A = 0.5\) in case of a tie), and update:&lt;/p&gt;
&lt;p&gt;\[ r_A \leftarrow r_A + K (s_A - p_A). \]&lt;/p&gt;
&lt;p&gt;The basic assumption in DLO&amp;rsquo;s approach is that \(K\) is &amp;ldquo;optimal&amp;rdquo; if these prediction errors \(s_A - p_A\) are, on average, as small as possible. For a given \(K\) and a set of matches \(t \in {1, …, T}\), we look at the squared errors and minimise their mean:&lt;/p&gt;
&lt;p&gt;\[ K^* = \argmin_K \frac{1}{T} \sum_{t=1}^T (s^{(t)} - p^{(t)})^2. \]&lt;/p&gt;
&lt;p&gt;This mean squared error is a standard loss for training machine learning models; when we apply it to probabilistic predictions like this, it&amp;rsquo;s known as the &lt;em&gt;Brier loss&lt;/em&gt;. We can search over \(K\) to find \(K^*\), the update factor that makes Elo&amp;rsquo;s predictions as accurate as possible on that dataset.&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt; Different games (and different datasets) will generally end up with different \(K^*\). Once we&amp;rsquo;ve found \(K^*\), we can run Elo with that value and then look at the resulting rating distributions.&lt;/p&gt;
&lt;h3 id="why-k-is-not-our-skill-metric"&gt;Why K* is not our skill metric&lt;/h3&gt;
&lt;p&gt;Before we happily run Elo with \(K^*\) and stare at the resulting distributions, let&amp;rsquo;s pause and ask what \(K^*\) itself is telling us. Earlier I compared \(K\) to a step size in an iterative learning process: a larger \(K\) means we take bigger steps on each update and let a single match pull the ratings around more. If the &amp;ldquo;optimal&amp;rdquo; \(K^*\) is large, doesn&amp;rsquo;t that mean each game is very informative about the players&amp;rsquo; skills? That sounds suspiciously close to what we&amp;rsquo;re trying to measure. Can we just use \(K^*\) as our coveted luck–skill number?&lt;/p&gt;
&lt;p&gt;Not quite. First of all, as we&amp;rsquo;ve already discussed above, the optimal \(K\) depends strongly on the player population. Larger sets of matches will tend to have smaller \(K^*\), even if the underlying skill levels are exactly the same, simply because with more data you don&amp;rsquo;t need to react as violently to each individual result. That&amp;rsquo;s why we have to calibrate \(K^*\) on the exact dataset we&amp;rsquo;re using.&lt;/p&gt;
&lt;p&gt;Second, two games might demand the same underlying skills, but still have very different learning curves: some are slow and steady, others click after a single &amp;ldquo;epiphany&amp;rdquo;. Those learning dynamics also feed into \(K^*\): in a game where people improve in big jumps, you&amp;rsquo;ll see a different &amp;ldquo;optimal&amp;rdquo; step size than in a game where everyone creeps up gradually. So even if two games are equally skill-based in the end, their \(K^*\) values can be quite different, and comparing them would be misleading.&lt;/p&gt;
&lt;p&gt;Luckily, the standard deviation of the Elo distribution is much more robust to those issues than \(K^*\) itself: it mostly cares about &lt;em&gt;where everyone ends up&lt;/em&gt;, not about how fast they got there.&lt;/p&gt;
&lt;p&gt;We now have the theoretical foundation to compute Elo distributions and their standard deviation. What we still need is actual game data. I&amp;rsquo;ve already teased how this applies to snooker and tennis, and in a later article we&amp;rsquo;ll look at many more concrete examples.&lt;/p&gt;
&lt;h2 id="a-toy-universe-of-luck-and-skill"&gt;A toy universe of luck and skill&lt;/h2&gt;
&lt;p&gt;Before we get there, though, I want to take a closer look at a synthetic example. There are two good reasons for this extra step. First, it gives us a simple little sandbox where we can see what&amp;rsquo;s going on and sanity-check that the method behaves as we expect. Second, it lets us build an excellent benchmark that will help us interpret those fairly abstract standard deviations later on.&lt;/p&gt;
&lt;h3 id="extreme-worlds-pure-chance-vs-pure-skill"&gt;Extreme worlds: pure chance vs pure skill&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s start with two extreme scenarios. First, a game of pure chance, where the winner is literally decided by a coin toss. Second, a game of pure skill, where there is some fixed underlying skill ranking and the stronger player always beats the weaker one. What would the Elo distributions look like in those two imagined worlds?&lt;/p&gt;
&lt;p&gt;In the totally random case, no player ever has any real advantage over another, so the &amp;ldquo;skill chips&amp;rdquo; just get tossed back and forth. Some winning streaks will occur, of course, but in the long run they&amp;rsquo;re balanced by losing streaks. Elo will keep nudging ratings back towards the middle, and everyone&amp;rsquo;s rating will hover near 0. The overall spread \(\sigma\) settles into a very narrow band around 0.&lt;/p&gt;
&lt;p&gt;In the opposite extreme, there is a fixed skill ranking, and the strongest player always beats everyone else. This top player will keep siphoning rating points from their opponents and never really settle at a final value. Elo is designed so that very large skill differences lead to only tiny rating changes, but in a world of perfect skill, there is always at least one opponent – the second-best player – who still gives them a little positive update every time they meet. The second-best player in turn keeps gaining points from everyone below them, and so on down the ladder. As a result, the strongest players drift further and further away from the pack, while the weakest ones sink lower and lower. In principle, the spread of ratings can grow without bound.&lt;/p&gt;
&lt;h3 id="the-p-deterministic-game"&gt;The p-deterministic game&lt;/h3&gt;
&lt;p&gt;With the extremes out of the way, we can now blend them into an intermediate case: the &lt;em&gt;\(p\)-deterministic game&lt;/em&gt;. The idea is simple. We fix an underlying skill ranking for all players. Before each match, we flip a weighted coin: with probability \(p \in [0,1]\) we play a game of pure skill, where that ranking decides the winner; with probability \(1-p\) we play a game of pure chance, where the winner is chosen at random. This little &lt;em&gt;Gedankenspiel&lt;/em&gt; is easy to understand and reason about. It gives us an idealised example of a game with &amp;ldquo;roughly \(p\) parts skill and \(1-p\) parts luck&amp;rdquo;, and it serves as the benchmark I promised — something we can later compare real games against. And because the rules are so simple, we can easily run simulations and calculate the resulting Elo distributions:&lt;/p&gt;

&lt;img
	src="https://blog.recommend.games/posts/elo-as-a-skill-o-meter/elo_distribution_p_deterministic.svg"
	alt="Elo distribution plots for various p_deterministic games"
	 /&gt;

&lt;h3 id="what-simulations-tell-us"&gt;What simulations tell us&lt;/h3&gt;
&lt;p&gt;The first plot already shows the basic pattern: as we turn up \(p\) and let skill matter more often, the Elo distribution gets wider and wider. To make this easier to see, we can just take the standard deviation \(\sigma\) of each distribution and plot it against \(p\):&lt;/p&gt;

&lt;img
	src="https://blog.recommend.games/posts/elo-as-a-skill-o-meter/p_deterministic_vs_sigma_two_players.svg"
	alt="p_deterministic vs σ for two players"
	 /&gt;

&lt;p&gt;The result is a smooth, monotone curve: higher \(p\) consistently leads to a larger Elo spread \(\sigma\). That gives us exactly what we wanted — a way to translate those abstract standard deviations into a more tangible &amp;ldquo;skill fraction&amp;rdquo; \(p\). Later on, when we look at real games, we&amp;rsquo;ll be able to say &amp;ldquo;this game behaves roughly like a 70%-skill world&amp;rdquo; by matching its Elo spread to this benchmark curve.&lt;/p&gt;
&lt;h2 id="whats-next"&gt;What&amp;rsquo;s next&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s still one big limitation left: everything so far has assumed two-player games. In the next part of this series, we&amp;rsquo;ll teach Elo to handle real multiplayer tables — the kind we actually have in modern board games — and only then move on to real-world data.&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;Interestingly, the Elo distribution for men&amp;rsquo;s tennis (ATP) looks more similar to the one for snooker than women&amp;rsquo;s tennis.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;Remember that \(K=42\) I used in the &lt;a href="https://blog.recommend.games/posts/world-snooker-champion-2025/#how-elo-predicts-the-winners"&gt;snooker article&lt;/a&gt;? I promised I&amp;rsquo;ll explain in excruciating depth where it came from and I think I kept my promise.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item><item><title>A brief introduction to Collaborative Filtering</title><link>https://blog.recommend.games/posts/collaborative-filtering/</link><pubDate>Thu, 09 May 2024 21:47:57 +0300</pubDate><author>contact@recommend.games (Recommend.Games)</author><guid>https://blog.recommend.games/posts/collaborative-filtering/</guid><description>&lt;h2 id="what-is-a-good-recommendation"&gt;What is a good recommendation?&lt;/h2&gt;
&lt;p&gt;Collaborative filtering is the workhorse powering the recommendations by Recommend.Games. Over the years, I&amp;rsquo;ve been asked every now and then how it works. So, I thought it&amp;rsquo;s high time I outlined the basic ideas behind our recommendation engine.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s first take a step back and talk about recommendations in general. What is it we&amp;rsquo;re trying to achieve? The answer to this question is far from trivial, and it gets harder when you want to formalise its goals. Maybe a somewhat naïve approach would be to say that we want to recommend items that the user will like. But recommendations are as much about predicting what the user wants as what they didn&amp;rsquo;t even know they wanted. Sometimes the most &amp;ldquo;correct&amp;rdquo; answer is also the least useful: maybe our #1 recommendation is &lt;a href="https://recommend.games/#/game/266192" style="font-variant: small-caps;"&gt;Wingspan&lt;/a&gt; and the user indeed would love to play it - but if they already knew about it, why recommend it in the first place?&lt;/p&gt;
&lt;p&gt;To be honest, the solution that powers Recommend.Games pretty much ignores all those questions and asks a much simpler question: given the games a user has rated, can we predict how they would rate all the other games? We can then take the highest predicted ratings and use those to recommend games to the user.&lt;/p&gt;
&lt;h2 id="the-intuition-behind-collaborative-filtering"&gt;The intuition behind collaborative filtering&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s make this a little more concrete and assume we only have six users we&amp;rsquo;ll call A through F. They&amp;rsquo;ve left the following ratings:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Game&lt;/th&gt;
 &lt;th style="text-align: center"&gt;A&lt;/th&gt;
 &lt;th style="text-align: center"&gt;B&lt;/th&gt;
 &lt;th style="text-align: center"&gt;C&lt;/th&gt;
 &lt;th style="text-align: center"&gt;D&lt;/th&gt;
 &lt;th style="text-align: center"&gt;E&lt;/th&gt;
 &lt;th style="text-align: center"&gt;F&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;&lt;a href="https://recommend.games/#/game/266192" style="font-variant: small-caps;"&gt;Wingspan&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;7&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;&lt;a href="https://recommend.games/#/game/13" style="font-variant: small-caps;"&gt;CATAN&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;&lt;a href="https://recommend.games/#/game/174430" style="font-variant: small-caps;"&gt;Gloomhaven&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;9&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;10&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;&lt;a href="https://recommend.games/#/game/12333" style="font-variant: small-caps;"&gt;Twilight Struggle&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;7&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
 &lt;td style="text-align: center"&gt;&lt;em&gt;?&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Our task is to find a model to fill in the blanks. How would we go about this? The basic idea behind collaborative filtering is to find users with similar tastes and use their ratings to predict the missing ones. Let&amp;rsquo;s take a closer look at users &lt;strong&gt;C&lt;/strong&gt; and &lt;strong&gt;E&lt;/strong&gt;. They both seem to agree fairly well on their ratings for &lt;a href="https://recommend.games/#/game/13" style="font-variant: small-caps;"&gt;CATAN&lt;/a&gt; and &lt;a href="https://recommend.games/#/game/12333" style="font-variant: small-caps;"&gt;Twilight Struggle&lt;/a&gt;, so it&amp;rsquo;s a fair guess that user &lt;strong&gt;E&lt;/strong&gt; would rate &lt;a href="https://recommend.games/#/game/266192" style="font-variant: small-caps;"&gt;Wingspan&lt;/a&gt; similar to user &lt;strong&gt;C&lt;/strong&gt;. But we can also make this argument &amp;ldquo;in the other direction&amp;rdquo;: users seem to rate &lt;a href="https://recommend.games/#/game/13" style="font-variant: small-caps;"&gt;CATAN&lt;/a&gt; and &lt;a href="https://recommend.games/#/game/12333" style="font-variant: small-caps;"&gt;Twilight Struggle&lt;/a&gt; in general similarly, so one game&amp;rsquo;s ratings should be a good predictor for the other&amp;rsquo;s. So since users &lt;strong&gt;F&lt;/strong&gt; seems to dislike &lt;a href="https://recommend.games/#/game/13" style="font-variant: small-caps;"&gt;CATAN&lt;/a&gt;, it&amp;rsquo;s a fair assumption that they would also dislike &lt;a href="https://recommend.games/#/game/12333" style="font-variant: small-caps;"&gt;Twilight Struggle&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="matrix-factorisation-via-alternating-least-squares"&gt;Matrix factorisation via alternating least squares&lt;/h2&gt;
&lt;p&gt;That&amp;rsquo;s the high level idea behind collaborative filtering – but how do we actually implement it? If you know your Machine Learning 101, you might be familiar with linear regression, which tries to find a &amp;ldquo;line of best fit&amp;rdquo; for a set of given data points. We can think of collaborative filtering as linear regression in two directions: if we fix the games and try to predict the users&amp;rsquo; ratings, we&amp;rsquo;re doing linear regression in the user space. If we fix the users and try to predict the games&amp;rsquo; ratings, we&amp;rsquo;re doing linear regression in the game space. Collaborative filtering tries to do both at the same time – by alternating between the two, which is why this method is known as &lt;em&gt;alternating least squares&lt;/em&gt; (ALS, &amp;ldquo;least squares&amp;rdquo; referring to minimising the squared error).&lt;/p&gt;
&lt;p&gt;After running the algorithm, we end up with one vector for each user and one for each game. Taking the inner product (sometimes referred to as dot product) of those vectors yields a single number, the predicted rating. If we stack all the vectors for the users into a matrix \(U\) and the vectors for the games into a matrix \(G\), we can multiply those two matrices to get a matrix \(R\) of predicted ratings:&lt;/p&gt;
&lt;p&gt;\[
R = U^\top \cdot G.
\]&lt;/p&gt;
&lt;p&gt;In other words: we&amp;rsquo;ve taken the matrix of ratings from the table above and factored it into two matrices. So another way of thinking about collaborative filtering is as a matrix factorisation problem: we&amp;rsquo;re trying to find two matrices that, when multiplied together, approximate the original matrix as closely as possible.&lt;/p&gt;
&lt;h2 id="latent-factors-and-embeddings"&gt;Latent factors and embeddings&lt;/h2&gt;
&lt;p&gt;Those user and game vectors are also known as &lt;em&gt;latent factors&lt;/em&gt;. That&amp;rsquo;s because they take the high-dimensional space of users and games and project it into a lower-dimensional space. While those latent dimensions don&amp;rsquo;t carry any particular human interpretable meaning, they capture the essence of what makes a user like a game. For example, one latent factor might capture how much a user likes games with a lot of player interaction, while another might capture how much they like games with a lot of strategic depth. Note that we can freely choose the number of latent factors, which is an important hyperparameter of the model. The more latent factors we use, the more expressive the model can be – but it also becomes more prone to overfitting. Recommend.Games uses 32 latent dimensions, which is a decent default choice for many recommendation problems.&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;The latent factors also have some interesting properties. For example, there&amp;rsquo;s a meaningful distance between them: if two users are close together in the latent space, they have similar tastes. The distance in question is the cosine similarity, which measures the angle between two vectors. Think of it this way: if the vectors for two users point in the same direction (cosine similarity close to 1), they have similar tastes. If they point in opposite directions (cosine similarity close to -1), they have opposite tastes. If they&amp;rsquo;re orthogonal (cosine similarity close to 0), they have no correlation.&lt;/p&gt;
&lt;p&gt;The same goes for the game vectors: we can calculate the cosine similarity between two games to see if they appeal to the same users. Every game page on Recommend.Games has a &amp;ldquo;You might also like&amp;rdquo; section at the bottom. Those are the games with the highest cosine similarity to the game in question.&lt;/p&gt;
&lt;p&gt;Vector representations with some measure of distance are called &lt;em&gt;embeddings&lt;/em&gt;. So this is yet another way of thinking about collaborative filtering: we&amp;rsquo;re embedding users and games into a latent space where we can measure their similarity.&lt;/p&gt;
&lt;p&gt;So, this was a pretty lengthy and technical article, but I hope it provides some intuition of how the recommendation engine behind Recommend.Games works. Obviously, there are a lot more details to it, but those are the most important ideas. One of the reasons why I wanted to talk about this topic is that we can have more fun with those latent factors, so stay tuned for more articles on this topic!&lt;/p&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;The number of latent dimensions also affects the size of the model. As mentioned, we have one latent vector for every user and every game, each with 32 entries. Currently, we have over 500 thousand users and over 90 thousand games, so the matrix \(U\) has over \(500\,000 \cdot 32 = 16\,000\,000\) entries and the matrix \(G\) has over \(90\,000 \cdot 32 = 2\,880\,000\) entries. Each entry is a 64-bit (8 bytes) floating point number, so the total size of the model is around \((16\,000\,000 + 2\,880\,000) \cdot 8\,\text{B} \approx 150\,\text{MB}\). It&amp;rsquo;s not a negligible size to load into the memory of a cheap server (which is what we&amp;rsquo;re using), but in the modern age of deep neural networks with gazillions of parameters, it feels almost tiny.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</description></item></channel></rss>