Hi Dmitry,

If you don't mind, I put the mailing list in copy so that everybody can read this.

Hello Antoine,

I was reading the Computer Go mailing list and saw your message titled "Fast play on 9x9 board using bitmaps". I am currently trying to implement a fast bitmap go board. Therefore I am very interested in your work. Could you possibly share your expertise? I would like to know what data structure you used, how you kill groups and detect liberties? Your advice would be extremely valuable for me.

Your Sincerely,
Dmitry Kamenetsky


The idea was to take advantage of the 2 following requirements:
- 9x9 board (board state can be represented as 2 128-bits words)
- no need to undo (target was MC-Go)

At that time, I was also impressed by the potential of the new Core2 architecture that is able to perform full 128-bits ops 3 times faster than the P4 (with a better latency that also helps to reduce data dependency).

I played a few days with the idea it was possible to do something, and developped a piece of experimental code. Unfortunately the machine I used is a Pentium M 1.5 MHz (not even 64-bit), so the results does not really represent what could be achieved on the Core2 architecture.

Speed test was done in the following way:
- start from empty board
- play 110 random moves (110 because it was reported as the average number of moves per game in MC-Go)

I reached about 450 cc (clock cycle) per move on average. This timing includes the random move generator, but there is absolutely no logic for eye detection, nor any evaluation of terminal positions. An interesting point is that the random move generator takes about half of the time (200 cc per move on average for random generation vs 250 cc for move playing). Doing this I realized that it was not easy to generate random moves (keeping uniform distribution of course).

The following data is maintained after each move (bitmap = 128-bit word):
- bitmap of empty points
- bitmap of black points
- bitmap of white points
- bitmap of points with a stone
- for every block: bitmap of stones in the block, bitmap of liberties
- white/black bitmaps after last 2 moves (used to test Ko condition)
- and probably a few other small things I forgot

I also have precomputed bitmaps, for each point:
- bitmap with this unique point set (used to initialize a new block when a stone is played here)
- bitmap with 4-neighbour set (used to initialize the liberty bitmap)

Playing a stone is done in the following way:
- get precomputed bitmaps (stone/liberty) for this stone in order to initialize block data
- get stones/liberty bitmap for connecting blocks and merge them
- get stones/liberty bitmap for opposite neighbour
- substract each other liberties
- check if there are some empty liberty bitmap (for surrounding opposite blocks or played stone)
- remove possible dead stones

All the operation are logical operation (intersection/union/difference) that need only one instruction. Checking if a bitmap is empty needs a few operations.

In order to get the stone/liberty bitmap from a point (you need this when you check surrounding blocks) I use a classical chained list: the last point played in a block always becomes the root (and block data is stored under the root). When you connect, the root of connected blocks will point to the new root. When you have a point, you follow the link until you reach the root.

As I said before, the random move generator is probably where there is much improvement to do. Random number generator are fast (MT19937 needs 20 cc for each 32-bit random number). Since a byte is sufficient to represent a potential move, it takes 5 cc per potential move. The trouble is to map this byte to empty positions on the board (that are only described by a bitmap), and to keep uniform distribution. I adopted the following 2-step scheme:

Step1
- get a random byte, perform a bit test in the empty point bitmap: if point is empty this is our move, otherwise throw the random byte away
 - do it 3 more time
- if this fails, probably only a few bits are empty and we could wait a long time before we select one. Goto step2
Step2
- build a compact representation of the empty bits: parse the bitmap and build an array with empty bits indexes, let N be the size of the array (this is also the number of empty bits), and N' the power of 2 immediately above N - get a random byte R, if R modulo N' is above N throw it, otherwise get the index of empty bit at position R (mod N') in the array. Note that you can't take R modulo N as you would loose uniformity
 - continue until you find an empty bit

Of course building the array in step 2 is a costly operation. But staying in step 1 until you find a bit is much more time expensive (imagine how long you can wait if you have only one empty bit).

I did not code with extreme care the random generation. It could probably be more efficient. Another way to improve it is to try to maintain the array of step 2 incrementaly, in order to avoid the cost of rebuilding. I strongly believe it could improve timing (provided you don't spend to much time: if it's too hard just invalidate it, and start from step 1 the next move). I did not implement this kind of improvement.

I did not make further experimentation as I'm currently not that much interested in MC-Go. I was also a little bit disappointed, because the board implementation I use plays 300 random moves on a 19x19 board with 320 cc (play + undo) per move and provides much more information. The bitmap implementation needs 250 cc for play only, has limited data structure, and would perform poorly on 19x19 board.

However, once again results would probably be very different on a Core2: more efficient 128-bit operations, more registers (I'm still with a 32-bit processor) that would help to improve data dependency and to reduce memory access...

Best regards,

   Antoine







_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to