(One or Multi-Armed) Bandit Problem Models
- Primary Authors:
- Additional Contributors: None
- Related Models: None
Suggested Citation: See license page.
Based originally on the problem for a gambler facing many one-armed bandits (or one multi-armed bandit) this model provides a parsimonious framework in which to explore maximizing behaviour in an environment where an agent must trade-off exploration (finding out about the payoffs of a given arm) against taking the current maximal payoff.
For more details see this summary paper on Bandit Problems.
Models and Results
Social Learning in One-Armed Models
Rosenberg, Soland and Vieille: "study a two-player one-arm bandit problem in discrete time, in which the risky arm can have two possible types, high and low, the decision to stop experimenting is irreversible, and players observe each other's actions but not each other's payoffs. We prove that all equilibria are in cutoff strategies and provide several qualitative results on the sequence of cutoffs."
- These type of games display informational externalities -- my further experimentation gives you information about the payoffs. However this paper does not explore welfare. As the authors point out allowing decisions to be reversible permits free-riding to occur and this may change equilibrium behaviour.
- Cutoffs behave in the way one would expect (decreasing over time if both players still in)
Analogies with Real_Options_Models
Analogies with Herding_Models, at least when the number of players is large (since, at least asymptotically, the dropping out or staying in of other players fully reveals the true value of theta.