# Prisoner’s Dilemma: Where Equilibrium is not always Socially Optimum

Prisoner’s Dilemma is a very popular situation enacted innumerable times in modern cinema and television shows. The situation is very simple.

Two suspects Shawn and Gus are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal: if one testifies (defects) for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both remain silent, both prisoners are sentenced to only six months in jail for a minor charge.

If each betrays the other, each receives a five-year sentence. Each prisoner must make the choice of whether to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation. How should the prisoners act?

A prisoner’s action depends a lot on how the other convict reacts to the same situation. Game theory can be used to predict the outcome. However, we will see, Prisoner’s Dilemma is a bit unique.

From the payout matrix, if they were great friends and trusted each other, they know that the other wouldn’t defect. Hence they could play a co-operative strategy with both remaining silent and be sentenced for 6 months each. This is the Pareto Optimal solution where both benefit.

A Pareto optimal solution is one where no “one” person can be made better off without making someone else worse off than before. The Pareto Optimal solution is that both remain silent (Silent, Silent). Their combined sentence is 1 year. For Shawn to be made better off (go free), he has to defect making Gus worse off than before.

However, if both of them were rational and not confident of the other’s reaction, they would try and maximize their individual payoffs. Thus they will always defect since, irrespective of the other person’s decision, defecting always results in a comparatively better payoff (5 years vs. 10 years or 0 vs. 6 months). Defect, Defect is the dominant strategy for both.

The NASH equilibrium is this self-interest maximizing solution, which is both of them confessing and ending up with a sentence of 5 years each. The beauty of this non-iterated Prisoner’s dilemma problem is that this equilibrium is not Pareto optimal as before.

The economic moral of the puzzle is that a group whose members pursue rational self-interest may all end up worse off than a group whose members act contrary to rational self-interest. More generally, if the payoffs are

Simply put, in most situations at least one would always confess (driven by self interest) and the cops eventually have someone to frame and they solve the case :)

Now, look around you for instances of Prisoner’s Dilemma. You would be surprised how common it is. But then, that’s for the next post.

For the math lovers,

If Shawn and Gus were two rational mathematicians, they would gauge the relative probability of the other person defecting and decide accordingly based on the total payoff. And the probability would be based on how much they can trust each other. However, sadly, in this particular instance (in Prisoner’s Dilemma problem), defecting is the dominant move irrespective of the probabilities.

It can be easily shown that Payoff for Shawn if he defects is always higher.

Sentence for Shawn if Shawn is Silent > Sentence for Shawn if Shawn defects

[10x + 0.5(1-x)] > 5x + 0 * (1 – x)

4.5x + 0.5 > 0

For all +ve value of x

where x is the probability between 0 and 1, of Gus defecting

Thus irrespective of Gus’ move, Shawn will always defect.

Two suspects Shawn and Gus are arrested by the police. The police have insufficient evidence for a conviction, and, having separated both prisoners, visit each of them to offer the same deal: if one testifies (defects) for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both remain silent, both prisoners are sentenced to only six months in jail for a minor charge.

If each betrays the other, each receives a five-year sentence. Each prisoner must make the choice of whether to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation. How should the prisoners act?

A prisoner’s action depends a lot on how the other convict reacts to the same situation. Game theory can be used to predict the outcome. However, we will see, Prisoner’s Dilemma is a bit unique.

Sentence for (Shawn, Gus) | Gus Defects | Gus remains silent |

Shawn Defects | 5 years, 5 years | 0 years, 10 years |

Shawn remains silent | 10 years, 0 years | 6 months, 6months |

From the payout matrix, if they were great friends and trusted each other, they know that the other wouldn’t defect. Hence they could play a co-operative strategy with both remaining silent and be sentenced for 6 months each. This is the Pareto Optimal solution where both benefit.

A Pareto optimal solution is one where no “one” person can be made better off without making someone else worse off than before. The Pareto Optimal solution is that both remain silent (Silent, Silent). Their combined sentence is 1 year. For Shawn to be made better off (go free), he has to defect making Gus worse off than before.

However, if both of them were rational and not confident of the other’s reaction, they would try and maximize their individual payoffs. Thus they will always defect since, irrespective of the other person’s decision, defecting always results in a comparatively better payoff (5 years vs. 10 years or 0 vs. 6 months). Defect, Defect is the dominant strategy for both.

The NASH equilibrium is this self-interest maximizing solution, which is both of them confessing and ending up with a sentence of 5 years each. The beauty of this non-iterated Prisoner’s dilemma problem is that this equilibrium is not Pareto optimal as before.

The economic moral of the puzzle is that a group whose members pursue rational self-interest may all end up worse off than a group whose members act contrary to rational self-interest. More generally, if the payoffs are

__assumed to represent self-interest, a group whose members rationally pursue any goals may all meet less success than if they had__**not****rationally pursued their goals individually.**__not__Simply put, in most situations at least one would always confess (driven by self interest) and the cops eventually have someone to frame and they solve the case :)

Now, look around you for instances of Prisoner’s Dilemma. You would be surprised how common it is. But then, that’s for the next post.

For the math lovers,

If Shawn and Gus were two rational mathematicians, they would gauge the relative probability of the other person defecting and decide accordingly based on the total payoff. And the probability would be based on how much they can trust each other. However, sadly, in this particular instance (in Prisoner’s Dilemma problem), defecting is the dominant move irrespective of the probabilities.

It can be easily shown that Payoff for Shawn if he defects is always higher.

Sentence for Shawn if Shawn is Silent > Sentence for Shawn if Shawn defects

[10x + 0.5(1-x)] > 5x + 0 * (1 – x)

4.5x + 0.5 > 0

For all +ve value of x

where x is the probability between 0 and 1, of Gus defecting

Thus irrespective of Gus’ move, Shawn will always defect.