# How to beat your friends at Bohnanza

Rio Grande Games has a great game called Bohnanza. The objective of the game is to make the most money by planting different bean fields, where different types of beans require a different number of beans to get to the next value level. There are 4 values levels (of 1, 2, 3, and 4 coins) in the game with not every type of bean having each level. Different beans have different frequencies in the deck which roughly correspond to how many of them you need to plant to get to the next level of value.

*Data collection. Large number on the card is the bean’s frequency in the deck, with the different point levels below indicated by a stack of 1 to 4 coins.*

*Bean types with their frequency and number of beans needed to reach different point levels.*

| Type | Frequency | P1 | P2 | P3 | P4 |

|————|———–|—-|—-|—-|—-|

| Coffee | 24 | 4 | 7 | 10 | 12 |

| Wax | 22 | 4 | 7 | 9 | 11 |

| Blue | 20 | 4 | 6 | 8 | 10 |

| Chili | 18 | 3 | 6 | 8 | 9 |

| Stink | 16 | 3 | 5 | 7 | 8 |

| Green | 14 | 3 | 5 | 6 | 7 |

| Soy | 12 | 2 | 4 | 6 | 7 |

| Black-eyed | 10 | 2 | 4 | 5 | 6 |

| Red | 8 | 2 | 3 | 4 | 5 |

| Garden | 6 | 0 | 2 | 3 | 0 |

| Cocoa | 4 | 0 | 2 | 3 | 4 |

We assume this relationship roughly exists with as frequency decreases value (or value per card) increases. We calculate value per card as the number of coins divided by the number of bean cards needed to reach that level. We can then plot this with a regression line overlaid and see it is roughly linear relationship.

*Linear relationship between frequency and value per card.*

Because it takes you different amount of cards to get to the different point levels of each bean type, I assumed it was unlikely that they were balanced perfectly. So, if you could find which cards have higher probability of getting to the different coin levels first, you could exploit this to beat your friends at the game.

From here, I calculated the conditional probability of getting to each level. I assumed you would not be replacing the cards in the deck after one is drawn (which makes sense for the first round of the game). So the conditional probability is P(A) * P(B|A), where P(A) is the probability of drawing the first bean of a certain type, and P(B|A) is the probability of drawing another bean of that type. I continue this on to reach the different numbers of beans required for each point level (which ranges from 2 beans to 12 beans).

For example, to get 2 coins from a coffee bean farm you need 7 coffee
beans. The probability of that is:

24/154 * 23/153 * 22/152 * 21/151 * 20/150 * 19/149 * 18/148 = 9.75e-07

When we plot this out with probability on the x-axis and point values on the y-axis it looks like this:

*Probability of achieving goal vs Coins.*

But what we should look for is the beans at the right end of each different point level as ‘best performers’ and the the beans at the left end of each level as ‘worst performers’. From this, we can see that the odds of getting to a top level of the coffee bean (3 or 4 coins) is VERY LOW, even though they are super common in the deck (~16% of the deck at the start). And if you want to just get to the first coin level as easy as possible, the Soy Bean is the way to go.

*Probability of achieving goal vs Coins with ‘outliers’ highlighted.*

I think there are some other metrics that could synthesize the trade-offs between value per card, expected return, and probability, but this is still a work in progress. If you have any ideas, feel free to reach out or build on the analysis yourself. All code is available here.