Pages

Sunday 19 October 2014

Tic-Tac-Toe on Arduino (MiniMax)

While playing around with Arduino and Nokia LCD Screen I've decided to create some simple game. Obviously there's nothing simpler than tic-tac-toe. The game itself is trivial, everyone knows it so it should be pretty simple to learn the machine to be a good opponent. Yeah... Of, course I could write something really simple, like a translation table, in a sense that if player made a move A, then computer should do a move B... and describe all of the possible situations or "strategies". But that would be pretty primitive and not very appealing, wouldn't it ?

Instead the obvious choice is to implement the MiniMax algorithm for this game. Of course, there a dozens of great articles available describing the basics and fundamentals of MiniMax and especially in tic-tac-toe context so, I'm not going to go into details regarding how it works. Instead I would like you to have a look on any of the links bellow to get at least some essentials:


Tic-Tac-Toe indeed is a very simple game, even in MiniMax context; in comparison to chess or checkers or any other game alike. However the game tree may become pretty complex anyway. Let's think about it. The first level of the tree will have 9 nodes (since there's 9 possible initial moves), the second will have 9*8 nodes, the third 9*8*7 ... and so on, so already on the third level we will have a total of 9 + 9*8 + 9*8*7 = 585 nodes. OK, assuming that we game tree node is defined the following way:

struct node_t {
char *children;
// it's either 0 - empty, 1 - X, 2 - O, 
// two bits per field, 9*2 = 18 bits = 3 bytes
char game[3];
char children_cnt;
};

Which is 6 bytes per node on Arduino, and on the first level we'll need 8 children, which would give roughly:

  1. lvl: 9*6 = 54
  2. lvl: 9*8*6 = 432
  3. lvl: 9*8*7*6 = 3024
= 3024 bytes of memory only for 3 level deep tree ! That's a little disappointing. Even though, there's still some room for optimization in the tree node definition it's obvious that this approach is a dead end.

It's still possible to build and analyze the whole game tree even in 2kB of RAM with which Arduino is equipped. The answer is: recursion. In worst case (first move), the function will never recurse deeper than 8 times. This is pretty good it means that even if we would dedicate 1 kB of RAM only for the function to solve the current game state it would give 128 bytes available for local data - which in this case is a lot more than enough. 

Let's consider the general pseudocode for negamax first (negamax is a version of minimax for symmetrical games - meaning, player's A win is player's B loss):

int negaMax( int depth  ) {
    if ( depth == 0  ) return evaluate();
    int max = -oo;
    for ( all moves )  {
        score = -negaMax( depth - 1  );
        if( score > max  )
           max = score;
    }
    return max;
}

Let's try to put it more in the context of the Tic-Tac-Toe game. For the sake of simplicity first let's operate on the pre-build game-tree:

struct node {
struct node *children[8];
    uint8_t _ch_no;
    uint8_t grid[3];
};

int negamax(struct node *game_state, int depth) {
    if (d == MAX_DEPTH) return evaluate();
    for (int i = 0; i < game_state->_ch_no; i++) {
score = -1 * negamax(game_state->children[i], depth + 1);
if (score > max_score) max_score = score;        
}
return max_score;
}


That's a simplified version, but well describing the intention. We iterate through all children of the current game state as player A looking for the worst move of player B. We know though, that having the full game tree pre-built is a luxury. Instead we must build each game state we want to analyse iteratively in order to consider it, let's look on the modified code:

int negamax(int *grid, int depth) {
if (d == MAX_DEPTH) return evaluate();
for (int i = 0; i < GRID_SIZE; i++) {
// looking for an empty field in the grid
if (grid[i] != EMPTY) continue;
grid[i] = MARK_X;
score = -1 * negamax(grid, depth + 1);
grid[i] = EMPTY;
if (score > max_score) max_score = score;
}
return max_score;
}

That's pretty simple. Instead of having a pre-built list of nodes we simply create every node on the fly and analyse it, after that the game grid is restored to it's original state. That's the essence. This will be slightly slower than having the tree but it will work and it's good enough.

The key to a good playing program is the evaluation function. That function takes the current game state and should return how likely it is for the current player to win. If your program is doing simple mistakes or looses in "obvious win" situation the evaluation function is to blame. There are many approaches, to count the potentially winning rows and columns, count empty columns etc. It all comes down to exploring all the different possibilities. 

It took me a while to test all the different approaches and decide on the evaluation function. Finally I've decided to describe each game using a score between -100..100. A game which is a loss gets a score of -100, a game which is a win gets 100 points. What about all the values in between ? I'm using them to score a non-terminal game state. I'm counting all the rows, columns and diagonals on the board which do not contain the opponent's marking - meaning they can be potential wins. I'm doing the same for the opponent. After that I have two numbers: A- potential winning rows,cols and diags for player A and B - potential winning rows,cols & diags for player B. The game is scored by substracting those two numbers. Let's consider the following examples:

Game Evaluation example.
The values will vary within range [-8..8] which isn't much in comparison to the win/loss score, it's enough though to take a reasonable game decisions (well... in most cases :)). 

It's time to consider the full code.



This code is machine independent and can be compiled and executed on a standard x86 as well. Let's try that first. Since I was kinda lazy about the interface and my goal was to implement the MiniMax algorithm, the interface is clunky - but usable. In order to play type a number 0 - 8 in order to mark the field on the game grid. Human player starts first and uses the X mark. Let's play a game:

$ make
$ ./ttt
- - -
- - -
- - -
4
- - -
- X -
- - -
CPU
 O - -
 - X -
 - - -
2
O - X
- X -
- - -
CPU
 O - X
 - X -
 O - -
3
O - X
X X -
O - -
CPU
 O - X
 X X O
 O - -
1
O X X
X X O
O - -
CPU
 O X X
 X X O
 O O -
8
O X X
X X O
O O X
CPU
 O X X
 X X O
 O O X

Not bad. I must say that it plays pretty good in most of the cases. Try it out yourself. Putting this code onto an Arduino shouldn't be a problem now. The execution speed comparison may be interesting as well. As far as on a pretty old Core 2 Duo laptop there was no delay visible, I'm sure that little Atmega will require some time (at least in the beginning) to determine the best move.

Let's prepare the project. Clone the following repository

$ git clone git@github.com:dagon666/dev_MiniMax.git
$ cd dev_MiniMax

We need libpca as well:
$ git clone git@github.com:dagon666/avr_Libpca pca

Now, we can build it:

$ cd arduino
$ make
$ make install

As expected, since initially the tree of possible games to explore is quite bit it takes around 2 - 3 seconds for the Arduino to find the first move. The game set is much smaller afterwards - there seems to be no delays in second and further moves.

Just for fun (since I had it already connected) I decided to run the game on the Nokia LCD as well (there is an option in the code to do that - just change the LCD_OUTPUT to 1 in main.c). Here's how the game looks on the LCD:









I hope that this post was useful and educating. Have fun programming!