Article Summary: Noisy Directional Learning and the Logit Equilibrium

The paper is here (ungated).  The ideas.repec entry is here.  I believe that this (1999) was an early version of the same.  The authors are Simon P. Anderson [Ideas, Virginia] , Jacob K. Goeree [Ideas, CalTech] and Charles A. Holt [Ideas, Virginia].  The full reference is:

Anderson, Simon P.; Goeree, Jacob K. and Holt, Charles A., “Noisy Directional Learning and the Logit Equilibrium.” Scandinavian Journal of Economics, Special Issue in Honor of Reinhard Selten, 2004, 106(3), pp. 581-602, September 2004

The abstract:

We specify a dynamic model in which agents adjust their decisions toward higher payoffs, subject to normal error. This process generates a probability distribution of players’ decisions that evolves over time according to the Fokker–Planck equation. The dynamic process is stable for all potential games, a class of payoff structures that includes several widely studied games. In equilibrium, the distributions that determine expected payoffs correspond to the distributions that arise from the logit function applied to those expected payoffs. This ‘‘logit equilibrium’’ forms a stochastic generalization of the Nash equilibrium and provides a possible explanation of anomalous laboratory data.

This is a model of bounded rationality inspired, in part, by experimental results.  It provides a stochastic equilibrium (i.e. a distribution over choices) that need not coincide with, nor even be centred around, the Nash equilibrium.  The summary is below the fold.

Games are played in continuous time.  There are at least two players and they all share the same possible action set along a single dimension:  x_{i}\left(t\right)\in\left(x_{L},x_{H}\right).  The instantaneous expected payoff for player i at time t depends on her chosen action and the distribution of other players’ decisions:

\pi^{e}_{i}\left(x_{i}\left(t\right),t\right)=\int\pi_{i}\left(x_{i}\left(t\right),x_{-i}\right)dF_{-i}\left(x_{-i},t\right)

where F_{i}\left(x_{i},t\right) is the probability that player i chooses an action less than or equal to x at time t and items with a subscript of -1 denote vectors of the relevant items for the n-1 other players.

Players’ rationality is bounded in that they only adjust their actions locally by moving in the direction of increasing expected payoff (as opposed to selecting the action that would globally maximise their expected payoff).  Furthermore, directional adjustments are subject to an additive disturbance, w_{i}\left(t\right), weighted by a variance parameter \sigma_{i}:

dx_{i}\left(t\right)=\pi_{i}^{e}\prime\left(x_{i}\left(t\right),t\right)dt+\sigma_{i}dw_{i}\left(t\right)

The authors justify the stochastic part of the local adjustment rule thus:

“errors” or “trembles” may occur because current conditions are not known precisely, expected payoffs are only estimated, or decisions are affected by factors beyond the scope of current expected payoffs, e.g. emotions like curiosity, boredom, inertia or desire to change.

This decision rule translates into a differential equation for the distribution functions of decisions, F_{i}\left(x_{i},t\right).  In particular, it yields the Fokker-Planck equation common in theoretical physics.  The first term is a drift term and the second a diffusion term.

\frac{\partial F_{i}\left(x_{i},t\right)}{\partial t}=-\pi_{i}^{e}\prime\left(x_{i}\left(t\right),t\right) f_{i}\left(x_{i},t\right) + \nu_{i}f_{i}\prime\left(x_{i},t\right)

where \nu_{i}=\sigma^{2}_{i}/2.

In a steady state of this process, the right hand side of the above equation is identically zero, allowing us to identify the steady-state distribution:

f_{i}\left(x\right)=\frac{\exp\left(\pi^{e}_{i}\left(x\right)/\nu_{i}\right)}{\int^{x_{H}}_{x_{L}}\exp\left(\pi^{e}_{i}\left(s\right)/\nu_{i}\right)ds}

where the t arguments have been dropped since the equation pertains to a steady state.  The n equations (one for each player) specified thus are not explicit solutions, but instead constitute equilibrium conditions for the steady-state distribution.  In a steady-state, these conditions will be satisfied simultaneously and so we arrive at a logit equilibrium – a logit probabilistic choice function combined with a Nash-like consistency condition.

The authors then prove that such an equilibrium will be stable for all potential games, where a potential game is one where there exists a function, V\left(x_{1},...,x_{n}\right), such that \partial V/\partial x_{i}=\partial\pi_{i}/\partial x_{i} for all i.

The class of potential games includes:

minimum-effort coordination games, linear/quadratic public goods and oligopoly games, and two-person 2 x 2 matrix games in which players select mixed strategies.

Note, in particular, that “in the presence of noise, equilibrium behavior is not necessarily centered around the Nash prediction; errors that push one player’s decision away from a Nash decision may make it safer for others to deviate.”  This “is important when the Nash equilibrium is near the boundary of the set of feasible decisions, so that errors are biased toward the interior.”