Hi Dmitry,
If you don't mind, I put the mailing list in copy so that everybody can
read this.
Hello Antoine,
I was reading the Computer Go mailing list and saw your message titled
"Fast play on 9x9 board using bitmaps". I am currently trying to
implement a fast bitmap go board. Therefore I am very interested in
your work. Could you possibly share your expertise? I would like to
know what data structure you used, how you kill groups and detect
liberties? Your advice would be extremely valuable for me.
Your Sincerely,
Dmitry Kamenetsky
The idea was to take advantage of the 2 following requirements:
- 9x9 board (board state can be represented as 2 128-bits words)
- no need to undo (target was MC-Go)
At that time, I was also impressed by the potential of the new Core2
architecture that is able to perform full 128-bits ops 3 times faster
than the P4 (with a better latency that also helps to reduce data
dependency).
I played a few days with the idea it was possible to do something, and
developped a piece of experimental code. Unfortunately the machine I
used is a Pentium M 1.5 MHz (not even 64-bit), so the results does not
really represent what could be achieved on the Core2 architecture.
Speed test was done in the following way:
- start from empty board
- play 110 random moves (110 because it was reported as the average
number of moves per game in MC-Go)
I reached about 450 cc (clock cycle) per move on average. This timing
includes the random move generator, but there is absolutely no logic for
eye detection, nor any evaluation of terminal positions. An interesting
point is that the random move generator takes about half of the time
(200 cc per move on average for random generation vs 250 cc for move
playing). Doing this I realized that it was not easy to generate random
moves (keeping uniform distribution of course).
The following data is maintained after each move (bitmap = 128-bit word):
- bitmap of empty points
- bitmap of black points
- bitmap of white points
- bitmap of points with a stone
- for every block: bitmap of stones in the block, bitmap of liberties
- white/black bitmaps after last 2 moves (used to test Ko condition)
- and probably a few other small things I forgot
I also have precomputed bitmaps, for each point:
- bitmap with this unique point set (used to initialize a new block when
a stone is played here)
- bitmap with 4-neighbour set (used to initialize the liberty bitmap)
Playing a stone is done in the following way:
- get precomputed bitmaps (stone/liberty) for this stone in order to
initialize block data
- get stones/liberty bitmap for connecting blocks and merge them
- get stones/liberty bitmap for opposite neighbour
- substract each other liberties
- check if there are some empty liberty bitmap (for surrounding opposite
blocks or played stone)
- remove possible dead stones
All the operation are logical operation (intersection/union/difference)
that need only one instruction. Checking if a bitmap is empty needs a
few operations.
In order to get the stone/liberty bitmap from a point (you need this
when you check surrounding blocks) I use a classical chained list: the
last point played in a block always becomes the root (and block data is
stored under the root). When you connect, the root of connected blocks
will point to the new root. When you have a point, you follow the link
until you reach the root.
As I said before, the random move generator is probably where there is
much improvement to do. Random number generator are fast (MT19937 needs
20 cc for each 32-bit random number). Since a byte is sufficient to
represent a potential move, it takes 5 cc per potential move. The
trouble is to map this byte to empty positions on the board (that are
only described by a bitmap), and to keep uniform distribution. I adopted
the following 2-step scheme:
Step1
- get a random byte, perform a bit test in the empty point bitmap: if
point is empty this is our move, otherwise throw the random byte away
- do it 3 more time
- if this fails, probably only a few bits are empty and we could wait
a long time before we select one. Goto step2
Step2
- build a compact representation of the empty bits: parse the bitmap
and build an array with empty bits indexes, let N be the size of the
array (this is also the number of empty bits), and N' the power of 2
immediately above N
- get a random byte R, if R modulo N' is above N throw it, otherwise
get the index of empty bit at position R (mod N') in the array. Note
that you can't take R modulo N as you would loose uniformity
- continue until you find an empty bit
Of course building the array in step 2 is a costly operation. But
staying in step 1 until you find a bit is much more time expensive
(imagine how long you can wait if you have only one empty bit).
I did not code with extreme care the random generation. It could
probably be more efficient. Another way to improve it is to try to
maintain the array of step 2 incrementaly, in order to avoid the cost of
rebuilding. I strongly believe it could improve timing (provided you
don't spend to much time: if it's too hard just invalidate it, and start
from step 1 the next move). I did not implement this kind of improvement.
I did not make further experimentation as I'm currently not that much
interested in MC-Go.
I was also a little bit disappointed, because the board implementation I
use plays 300 random moves on a 19x19 board with 320 cc (play + undo)
per move and provides much more information. The bitmap implementation
needs 250 cc for play only, has limited data structure, and would
perform poorly on 19x19 board.
However, once again results would probably be very different on a Core2:
more efficient 128-bit operations, more registers (I'm still with a
32-bit processor) that would help to improve data dependency and to
reduce memory access...
Best regards,
Antoine
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/