Prisoner's Dilemma


Close your eyes and imagine the following scenario.

You, Archibald the Unexceptional, and another prisoner of war, Bartholomew the Nothing-To-Write-Home-About, have just been captured in the Third Great Sesame Street War. It’s really not so bad in Cookie Monster’s Fungeon, except all you have to eat are cookies. Furthermore, Cookie Monster makes you play games to receive food every day. Are your eyes still closed? Good. The immersion is necessary.

Today, the rules of the game are simple. You and Bartholomew both have two options: cooperate or swindle. If you both decide to cooperate, each of you receive 2 cookies. However, if you swindle and Bartholomew cooperates, then you receive 4 cookies and Bartholomew receives 0 (the opposite is true if Bartholomew swindles and you cooperate). If you both decide to swindle, each of you receive 1 cookie.

To better visualize the outcomes of the game, you invent the payoff matrix:

Archibald
Bartholomew
Cooperate Swindle
Cooperate
2, 2
0, 4
Swindle
4, 0
1, 1

Conventionally, the payoff listed first belongs to the player who effectively chooses the row of the outcome (Archibald, in this example).

What will you choose?

Cooperate
Swindle

You decide to cooperate. How virtuous. Lucky for you, so does Bartholomew. But, as you and Bartholomew each enjoy your double-cookie snack, you feel a deep sense of regret. If I just decided to swindle, those cookies would be mine, you think. You glance at Bartholomew, whose agonized physiognomy reveals that he shares your regret. You vow to never make the same mistake again, for both your and Bartholomew’s sake.

But are there any possible outcomes in which neither you nor Bartholomew feel as though you could have done better? After some thought, you realize that if both you and Bartholomew choose to swindle, then neither of you could have done better by deviating individually.

You decide to swindle, you sneaky devil. Unfortunately, so does Bartholomew. But, as you and Bartholomew each somberly masticate your disappointing single cookies, you are fulfilled in knowing that you did the best you could given Bartholomew’s selfishness. Even if you both cooperated to receive more cookies, you would have regretted not swindling. You glance at Bartholomew, whose giddy face reveals that he shares your twisted glory.

Paradoxically, this outcome is objectively worse than the outcome of mutual cooperation. Nonetheless, you have a feeling that it’s very important to analyze the outcomes of games where no player could have done better individually. You name your new concept the Nash equilibrium, fondly dedicated to your favorite basketball player. You etch the following words into your doughy wall:

I hereby define a Nash equilibrium as an outcome of a game where no player regrets their action. -Archibald the Unexceptional, 2025

We say that an action is a best response for a player if they could not have benefitted from switching to a different action, given everyone else's action.

Symbolically, a best response for a player i to the actions ai of other players as an action ai such that for every other aiAi, we have ui(ai,ai)ui(ai,ai). We define the Nash equilibrium as an action profile ai where every player i plays a best response ai to ai.

In fact, no matter what Bartholomew decided to do, swindling would always be a best response for you. Symmetrically, swindling is always a best response for Bartholomew. So, it makes sense that the Nash equilibrium occurs when both players play a best response to each other, as by definition, neither of you would regret your action. Because cooperating is never a best response, we call it a strictly dominated strategy. Definitionally, strictly dominated strategies will never be played in a Nash equilibrium because there would be better options. However, finding equilibria won’t always be so obvious.

In game theory and popular culture, this game is known as the Prisoner’s Dilemma. In the original problem, a prison guard incentivizes two criminal partners to testify against each other with decreased sentence length as the payoff. Variations of the same fundamental game have widespread applications in the real world. A commonly known example is the Arms Race, where swindling is analogous to stockpiling nuclear warheads and cooperation is analogous to reducing one's supply of warheads.

Next Chapter: Coordination