More Resources

A reinforcement learning algorithm for obtaining the Nash equilibrium of multi-player matrix games.


1. Introduction

Globalization has played a significant role over the last decade in transforming the marketplace into one where most goods and services are transacted through multi-party competition. Consequently, the study of game-theoretic concepts and the development of effective methods for solving multi-player games have gained increasing attention in the research literature. Games occur in two primary forms: matrix games and stochastic games. An n-player matrix game is characterized by n different reward matrices (one for each player) and a set of action combinations characterizing the equilibria (Nash equilibria, in particular). Nash (1951) defined equilibrium to be an action combination from which no single player could unilaterally deviate to increase profit. Stochastic games are comprised of finite or infinite-horizon stochastic processes with finite states and state transition probability structure, in which the players seek equilibrium actions for every state so as to maximize their rewards from the overall game. Therefore, stochastic games are construed as a sequence of matrix games (one for each state) connected with transition probabilities. Further classification of games arises from the nature of the reward structure: zero-sum games and nonzero (general) sum games. Rewards of stochastic games are classified as discounted reward, average reward and total reward.

Though the fundamentals of game theory are fairly well established (Nash, 1951), the computational difficulties associated with finding Nash equilibria have constrained the scope of the research literature largely to the study of bimatrix games with limited action choices. Even in the absence of sufficient tools to appropriately analyze stochastic or matrix games, a majority of the marketplaces have evolved to incorporate transactions through competition. Therefore, to ensure healthy growth of the current competition-based economy, it is imperative to develop computationally feasible tools to solve large-scale stochastic and matrix games. In recent years, researchers have been able to characterize equivalent matrix games for both discounted reward and average reward stochastic games (Filar and Vrieze, 1997; Hu and Wellman, 1998; Borkar, 2002; Li et al., 2007). They also harnessed the advances in reinforcement-learning-based techniques to construct these equivalent matrix games (Hu and Wellman, 1998; Li et al., 2007). However, obtaining the Nash equilibrium for these equivalent matrix games has remained an open research issue, which is the focus of this paper.

As discussed in McKelvey and McLennan (1996), the appropriate method of computing Nash equilibria of a matrix game depends on:

(a) whether it is required to find one or all equilibrium points;

(b) the number of players in the game;

(c) the importance of the value of the Nash equilibrium.

No computationally viable method addressing all of the above is available in the existing literature. Nash equilibria of n-player matrix games can be obtained by solving a Non-linear Complementarity Problem (NCP), which for a two-player matrix game becomes a Linear Complementarity Problem (LCP) (McKelvey and McLennan, 1996). Lemke and Howson (1964) developed an efficient algorithm for obtaining Nash equilibria for bimatrix games by solving the associated LCP. Their algorithm was extended for finding Nash equilibria of n-person matrix games by Rosenmuller (1971) and Wilson (1971). However, these algorithms still have unresolved computational challenges. Mathiesen (1985) proposed a method of solving a NCP for n-player matrix games through a sequence of LCP approximations. A survey by Harker and Pang (1990) summarizes these and other developments on this topic. It may be noted that these methods are not guaranteed to obtain global convergence and often depend on the choice of the starting point. To our knowledge, the only openly available software that attempts to solve multi-player matrix games is GAMBIT (McKelvey et al., 2007). However, as observed by Lee and Baldick (2003), this software takes an unusually long computation time as the number of players and their action choices increase.

Game-theoretic models have been studied extensively in examining market competition in the energy and transmission segments of restructured power markets (as in Pennsylvania-Jersey-Maryland, New York, New England and Texas). These games are characterized by multidimensional bid vectors with continuous parameters. Upon suitable discretization of these bid vectors, many of these games can be formulated as matrix games. The degree of discretization dictates both the computational burden and the probability of identifying the Nash equilibria. Almost all of the literature studying power market games is devoted to optimization-based approaches, such as mathematical programming (Hobbs, 2001; Yao et al., 2003; Hobbs et al., 2006), co-evolutionary programming (Price, 1997) and exhaustive search (Cunningham et al., 2002). Even in a limited number of studies, where such games are formulated as matrix games, numerical examples are converted to bimatrix games and are solved using linear programming and LCP approaches (Ferrero et al., 1999; Stoft, 1999; Lee and Baldrick, 2003).

The mathematical programming approach to finding the Nash Equilibrium (NE) of matrix games has two primary variants: a Mathematical Program with Equilibrium Constraints (MPEC), see Luo et al. (1996), and Equilibrium Problem with Equilibrium Constraints (EPEC), see Su (2005). MPEC is a generalization of bilevel programming, which in turn is a special case of hierarchical mathematical programming (with two or more levels of optimization). MPECs resemble Stackelberg (leader-follower) games, which form a special case of the Nash game. In a Nash game each player possesses the same amount of information about competing players, whereas, in Stackelberg-type games, a leader can anticipate the reactions of the other players, and thus possesses more information in the game. The leader in a Stackelberg game chooses a strategy from his/her strategy set, and the followers choose a response based on the leaders actions (Luo et al., 1996), whereas in a Nash game all players choose actions simultaneously. When multiple players face optimization problems in the form of MPECs, EPEC models have been used to simultaneously find the equilibria of the MPECs (Cardell et al., 1997; Yao et al., 2003; Su, 2005; Ralph and Smeers, 2006).

The primary contribution of this paper is a novel stochastic-approximation-based reinforcement learning algorithm for obtaining NE of n-player matrix games. Extensive numerical experimentation is presented in Section 4, which demonstrates the ability of the learning algorithm to obtain NE. This section includes 16 matrix games with up to four players and 64 actions for each player, followed by an example of a restructured power network with competing generators. The numerical results indicate that the learning-based approach presented in this paper holds significant promise in its ability to obtain NE for large n-player matrix games. To our knowledge, the algorithm is the first of its kind that harnesses the power of the stochastic value approximation method that has been successfully used in solving large-scale Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems with single decision makers (Das et al., 1999; Gosavi et al., 2002; Gosavi, 2004). A formal proof establishing the convergence of the algorithm to NE solutions is not fully developed yet, and is currently being investigated. However, as discussed in the numerical evaluation section, the empirical evidence clearly indicates the algorithm's ability to converge to NE solutions.

In what follows, we first present a formal definition of a matrix game and its NE (Section 2). Thereafter, we discuss the key results from the literature that show that for both discounted and average reward stochastic games there exist equivalent matrix games. We subsequently present the learning algorithm, a detailed numerical study and concluding remarks in Sections 3, 4 and 5 respectively.

2. Matrix games

A matrix game can be defined by a tuple (n, [A.sup.1],...,[A.sup.n], [[~.R].sup.1],...,[[~.R].sup.n])). The elements of the tuple are as follows.

n denotes the number of players.

[A.sup.k] denotes the set of actions available to player k.

[r.sup.k] : [A.sup.1] x ... x [A.sup.n] [right arrow] R is the payoff function for player k, where an element [r.sup.k] ([a.sup.1],..., [a.sup.n]) is the payoff to player k when the players choose actions a = ([a.sup.1],..., [a.sup.n]).

[[~.R.sup.k] for all k, can be written as an n-dimensional matrix as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The players select actions from the set of available actions with the goal of maximizing their payoffs, which depend on all the players' actions. The concept of a NE is used to describe the strategy as being the most rational behavior by the players acting to maximize their payoffs. Thus, for a matrix game, a pure-strategy NE is an action profile a* = ([a.sub.*.sup.1],..., [a.sub.*.sup.n]), for which [r.sup.k] ([a.sub.*.sup.k], [a.sub.*.sup.-k]) [greater than or equal to] [r.sup.k] ([a.sup.k], [a.sub.*.sup.-k]), [for all][a.sup.k] [member of] [A.sup.k], and k = 1, 2,..., n. The equilibrium values denoted by Val[*] for player k with payoff matrices [~.R.sup.k] is obtained as Val[~.R.sup.k] = [r.sup.k] ([a.sub.*.sup.1],..., [a.sub.*.sup.n]). The appealing feature of the NE is that any unilateral deviation from it by any player is not worthwhile. A mixed-strategy NE for matrix games is a vector ([[pi].sub.*.sup.1],..., [[pi].sub.*.sup.n]), for which we can write:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Page 1 2 3 4 5 Next »
COPYRIGHT 2009 Institute of Industrial Engineers, Inc. (IIE) Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.

Copyright 2009 Gale, Cengage Learning. All rights reserved. Gale Group is a Thomson Corporation Company.

NOTE: All illustrations and photos have been removed from this article.


Marketplace

Learn how to distribute a press release

Try our new online printing. theupsstore.com/print
Today on Entrepreneur

Sign Up for the Latest in:
Online Business
Franchise News
Starting a Business
Sales & Marketing
Growing a Business

E-mail*

Zip Code*