Game-tree search: the Minimax and Alpha-Beta algorithms
August 17, 2007

Though they are quite different from each other Chess, Checkers, Go, Othello, Connect-4 and Tic-Tac-Toe also have similarities. They are all two-player zero sum perfect information games. The term two-player just means that there must be two opposing sides (football is considered two-player, for example). A zero sum game is one where an advantage for one player is an equally large disadvantage for the other. Perfect information basically rules out any game that has an element of chance. Yatzee? Right out the window. Poker? Forget about it. Jenga? Not even close.
For games that have these properties it is possible to set up a game tree to aid in the selection of the next move. For simplicity, consider the starting state of Tic-Tac-Toe to be the root of a tree. The root has nine branches, each leading to a successor state. Each of these has 8 branches leading to its successor states and so on. Some of the paths through the tree will end before others (a winning state is reached before all the slots have been filled) but some paths continue until depth 9 (or ply 9 in game-tree terminology), when all the slots have been filled.
After having exhausted the search space of the game, it is easy to find the paths that will lead to victory for either player. Knowing the path that X can take to the fastest victory is generally of little use, however, because O can thwart X’s plans of a swift victory any time it is her turn to move. Instead of traversing the path leading to the fastest possible victory, X’s best aim is to pick a path where her worst outcome will be victory (the best worst-case path). The Minimax game-tree search algorithm is designed to do just this, and alpha-beta pruning improves on it. This article tries to explain how both works.
The Minimax algorithm
Let us use Tic-Tac-Toe as an example when explaining how the Minimax algorithm works. The player whose turn it is to move at the root is called Max, and all even plies in the game-tree (i.e. the states where it is Max’ turn to move) are labelled accordingly. Max’ opponent is called Min. These names are adapted from their actions; Max is always trying to maximise her score, Min is always trying to minimise Max’ score.
To determine who won, a function to evaluate a game state is needed. The evaluation function takes a game state as its argument and should return a value indicating whether the state is a win, loss or draw for the player whose turn it is at that state. The function could for example return 1 if the state is a win, -1 if the state is a loss and zero if the state is a draw. The evaluation function is applied to all the leaf states. Leafs are game states without children, i.e. either a draw, or a win for one of the players.
After exhausting the tree and evaluating all leaf states, Minimax values are assigned to the internal states (all non-leaf states) of the game-tree. The Minimax values found in the leaf nodes are inverted and returned to their parent. Each parent picks the highest value returned to it, negates it, then returns the result to its parent, et cetera. The Minimax values returned to the root denote Max’ worst outcome for each corresponding move. All that remains for Max is to pick a move where her worst outcome is a win (if such a move exists–it does not in Tic-Tac-Toe).
The following method shows how the Minimax algorithm can be implemented in Objective-C. A noticeable difference from the description above is that the Minimax value is negated on return from the recursive call instead of before the return; this is how the Minimax algorithm is normally implemented.
-(int)minimaxWithState:(id)state player:(int)player
{
if ([state isEndOfGame])
return [state evaluateWithPlayer:player];
int score = -1; // value for loss
NSArray *moves = [state movesAvailable];
id enumerator = [moves objectEnumerator];
for (id m; m = [enumerator nextObjnextObject]; ) {
id s2 = [state successorStateWithMove:m];
int sc = -[self minimaxWithState:s2 player:3 - player];
if (sc > score)
score = sc;
}
return score;
}
The next listing shows the special Minimax function applied to the root of the tree (i.e. the position for which a move is sought). In discussions of the algorithm the notion of any special treatment of the root is often omitted; it is included here for the sake of completeness.
-(id)minimaxRootWithState:(id)state player:(int)player
{
id bestmove = nil;
int score = -1; // value for loss
NSArray *moves = [state movesAvailable];
id enumerator = [moves objectEnumerator];
for (id m; m = [enumerator nextObject]; ) {
id s2 = [state successorStateWithMove:m];
int sc = -[self minimaxWithState:s2 player:3 - player];
if (sc > score) {
bestmove = m;
score = sc;
}
}
return bestmove;
}
Depth-limited Minimax
Only when the search space is sufficiently small, like in our Tic-Tac-Toe example, is it possible to exhaust it fully using the Minimax algorithm. For practical applications this is almost never the case. Computers are not powerful enough to exhaust game-trees for practical applications where game-tree search would be desired. For example, it has been claimed that the game of Chess has more states than there are atoms in the known universe. Suffice to say that waiting for a search of that magnitude to finish becomes impractical.
A simple way of ensuring that the search will terminate in a practical timespan is to set a maximum limit on the depth of the search. The following method shows how the Minimax algorithm presented earlier can be amended to unconditionally stop after reaching a certain depth. Notice that the initial value for score has changed.
-(int)minimaxWithState:(id)state player:(int)player ply:(int)ply
{
if (!ply || [state isEndOfGame])
return [state evaluateWithPlayer:player];
int score = -1000; // value for loss
NSArray *moves = [state movesAvailable];
id enumerator = [moves objectEnumerator];
for (id m; m = [enumerator nextObject]; ) {
id s2 = [state successorStateWithMove:m];
int sc = -[self minimaxWithState:s2 player:3 - player];
if (sc > score)
score = sc;
}
return score;
}
Since the search may be terminated before it has reached the leaf nodes, the end states of many paths are lost. Thus the evaluation function will have to be enhanced: it must now be able to indicate how good non-terminal states in the game-tree are, in contrast to simply determining a win, loss or draw for an end state. Instead of returning -1, 0 or 1 the evaluation function must now return a value in a certain range (say, -1000 to 1000) indicating how good the state is. Performance of depth-limited Minimax algorithms greatly depends on how well the evaluation function identifies strong states.
Alpha-Beta pruning
In the late 50s it was realised that it was not necessary to visit all the nodes in a game-tree to correctly deduce its Minimax value. Uninteresting branches of the tree can be pruned away. Remember that the Minimax algorithm produces the value of the best worst-case. Alpha-Beta pruning terminates the search of a subtree as soon as it knows that the worst-case for the subtree is worse than previously searched paths. The idea is that if a path is worse than the current best path, time is not wasted trying to find out how bad it is.
To accomplish the pruning mentioned above two bounds are passed to a modified Minimax algorithm. The bounds are the highest (beta) and lowest (alpha) value that can affect the Minimax value at that point, and are continually updated as the search progresses. Since the Minimax value is negated at each step, the states of the bounds must also be negated and their states switched as they are passed on to the next level. If the Minimax value returned from a path is greater than or equal to the high bound, the path is pruned. Here’s an example:
-(int)alphaBetaWithState:(id)state
player:(int)player
ply:(int)ply
alpha:(int)alpha
beta:(int)beta
{
if (!ply || [state isEndOfGame])
return [state evaluateWithPlayer:player];
NSArray *moves = [state movesAvailable];
id enumerator = [moves objectEnumerator];
for (id m; m = [enumerator nextObject]; ) {
id s2 = [state successorStateWithMove:m];
int sc = -[self alphaBetaWithState:s2
player:3 - player
ply:ply-1
alpha:-beta
beta:-alpha];
if (sc > alpha)
alpha = sc;
if (alpha >= beta)
break; // prune branch.
}
return alpha;
}
In a worst-ordered tree (where the paths are ordered so that no pruning occurs) the Alpha-Beta algorithm visits the same number of leaf nodes as Minimax. On average it performs a lot better. Given a perfectly ordered tree, where the branches are pruned as early as possible, the Alpha-Beta algorithm can search twice as deep as the Minimax algorithm in the same timespan.
This post has been adapted from a section of my 2003 BSc Artificial Intelligence report on Generalised Game-Tree Search at the University of Westminster.
Introducing Phage
March 27, 2007
Phage is a strategy game for 2 players, where you play against an AI (built using my game-tree search framework). The interface is a bit crude—click origin, then click target instead of drag and drop—and the featureset is somewhat limited, but it’s got the basics (making moves, undo) and I’d like it out there for people to play with.

Previously I’ve just been re-implementing old classics (like Reversi & Connect-4), but this is completely original. After seeing some of my game-tree search work online Steve Gardner, Phage’s inventor, contacted me about creating a computer version of his game. So I did, after stalling mulling it over for about a year.
I created the core of the game logic at a hack day four weeks ago and a first stab at an interface the following weekend. A new hack day is coming up, and assuming I get the chance to hack more on Phage then, what do you think I should focus on next? A more challenging AI? Drag-n-drop of pieces? Highlight possible moves? Answers in the comments please!
SBAlphaBeta 0.2 released!
March 26, 2007
SBAlphaBeta is a Foundation framework for creating AIs for many 2-player games. It encapsulates the Alpha-beta algorithm. No prior experience with Artificial Intelligence is necessary in order to use this framework.
Version 0.2 is a “back to sanity” release, incorporating improvements found by developing more games against the framework. Changes include: classes and interfaces have received a common prefix. The interfaces for states have been renamed. There are now some minor restrictions on moves, which allows us to do more error checking. Several “convenience methods” ultimately turned out to be confusing and have been dropped; many of the remaining methods have been renamed for clarity.
Neural net Connect4 status update
December 18, 2006
As I have noted before I have been trying to use a genetic algorithm to evolve a multi-level perceptron (MLP—a specific type of neural net) to play Connect4 better than my hand-written player. My efforts have yet to bear fruit, and my patience is wearing thin. After multiple revisions to the GA and meticulous training for over 300 generations it is still not good enough—so I’ve decided to cheat.
I freely admit that my goal is not as pure as David & Kumar’s: they wanted to find out if Blondie24 could learn to play Checkers well completely on her own, while I am only concerned with building a better AI for my Connect4 game. It occurred to me that I can train MLPs to mimic the output from my existing evaluation function and seed my GA with them. This way the starting population already have nets as good as the best I can come up with—hopefully the GA should be able to pick things up from there.
Creating training sets was pretty easy. I wrote a short program to do a depth-first search of the valid states and spit out the first ten thousand unique ones, with their associated fitness. The NSSet Objective-C class proved very useful for this.
The second step in my plan, implementing back-propagation in my MLP class, was considerably harder. I ended up spending countless hours poring over books and some code I wrote for a coursework 4–5 years ago (we were predicting USD/GBP exchange rates). Even though the resulting code is the mental equivalent of a tongue-twister—lots of indexing into tightly coupled two- and three-dimensional arrays of various lengths—I’m actually really happy with it, and I’ve decided to release it as a separate project.
Now all I have to do is write a small program to create the initial population for my GA. I’ll keep you posted.
Playing at the edge of AI
November 24, 2006
Reading Blondie24, where I got the title of this post from, got me exited about neural nets and genetic algorithms again. I always did found them fascinating, but I didn’t really have any practical use for them myself—until now.
My goal for the Cocoa games project, and ggtl before it, has always been to make building of decent computer game AIs very easy (for two-player zero-sum perfect information games, at least, which is actually a rather large group). My focus has been on the scaffolding—encapsulating the Alpha-Beta algorithm and letting developers focus on implementing its key components: detecting legal moves, applying a move to a state, and the evaluation function.
While the first two are quite uninteresting (they’re just game-specific scaffolding), the evaluation function is what really makes or breaks the resulting AI. Given an arbitrary state, this function must determine its fitness; how good or bad the state is. If you are an experienced player you might instinctively know if a given state is good, but it may not be a simple task to translate that to computer code. While computer games AIs have traditionally relied on exploiting knowledge provided by human experts, David Fogel and Kumar Chellapilla set out to see if they could create a program that could learn to play checkers on its own. To achieve this they used a genetic algorithm to evolve a neural net which they used as their program’s evaluation function.
David and Kumar were remarkably successful: Blondie learned to play checkers to a human Master level. If I could reproduce their results in a way that would work for a broader range of games, my goal of making board-game AI creation very simple would be largely achieved.
Fast forward a few months. I have implemented a neural net evaluation function for my Connect 4 game. The net is somewhat simpler in construction than Blondie’s, though I copied the techniques used for evolving it straight from the book. After 25 generations I stopped the evolution and played a game against the best neural net so far; it beat me hands down. Unfortunately, after 100 generations my best neural net still haven’t been able to beat the hand-rolled evaluation function I wrote for the game originally (I must have had a stroke of genius back then).
This is where I am currently at. I don’t want to release a game with an AI inferior to that in the previous version (it is available from subversion though), and I am currently experimenting with a few possible ways to improve the training of the neural net. Rest assured any progress I make will be the topic of future posts.