[computer-go] Slightly improved MC algorithm

Heikki Levanto Tue, 27 Feb 2007 12:38:55 -0800

Hi,

I thought I'd report a small change I have made to the plain MC
algorithm. I have been unhappy with the fact that the result of the game
gets pressed into one bit (win/loose), and all other information is
discarded. This leads to silly endgame moves, since the algorithm sees
no difference between a 1-point win and a 100-point win.


After thinking a bit about this, I decided to keep the whole board image
at the end. In other words, I have one board-sized array of ints, and at
the end, when scoring the result of a simulation, I add +1 to every
point that belongs to black, and -1 to every point that belongs to
white. Finally I divide this by the number of simulations I have done. 

When evaluating the simulation, I sum up the points on this board. A
simple summing didn't work so well, I thought squaring the numbers
before summing would be better, but in fact it was worse. Now I end up
with the following:
   sum += sign(v) * pow(v, exp);
where exp is something like 0.5
I did run one player with exp=0.8, and it seemed to perform about
equally well.

As a side benefit, I get to know which points clearly belong to black or
white, and can skip unnecessary endgame moves easily (filling own
territory to one-point eyes, making silly invasions to enemy territory,
etc). I believe this could be used for much more clever pruning, but I
have not (yet?) done so.

The best of these experiments, Halgo-1.5-500k reached almost 1400 elo on
cgos, which I think is pretty good for plain MC without any trickery.
Here my notes of the versions I had there:

  Halgo1  - 1250 - the all first one 
  Halgo-1.0-500k -  477 - Traditional MC. Something wrong with it?
  Halgo-1.1-500k - 1280 - Simple MC with area evaluation
  Halgo-1.2-500k - 1053 - Same as 1.1, but with a non-linear scoring function 
(s*v*v)
  Halgo-1.3-500k - 1400 - Same as 1.1, but with pow(v,0.8) as the summing 
function.
  Halgo-1.4-500k -  492 - Same as 1.1, but with plain sign(v) as the summing 
function.
  Halgo-1.5-500k - 1390 - Going for the pow function again, with 0.5. About 
same as 1.3

I am not sure if there was something wrong with halgo-1.0, it was
supposed to play 'traditional' MC, summing just the winner of the games.

My code is heavily based on the Efficient Go Library by Lukas Lew.
Actually, my own code is only some 360 lines, warts and all. I have put
a tarball with Lew's library and my code at
   http://www.lsd.dk/heikki/halgo-1.5.tgz
if you want to look at it. Being derived from a GPL work, it is under
GPL itself too, of course.

The 500k refers to the number of simulated random games from the given
position, equally divided among all legal moves that don't fill
one-point eyes. 500k sounds like a large number but the games typically
use less than 5 minutes of their clock, so are quite ok to play on cgos.
This is on my dual-core AMD desktop (using only one core).

I still suspect that 500k is nowhere enough to get reliabe results, at
least from the opening position, as the program wants to start at all
sort of places. It has some preference to the centerpoint (on 9x9), and
some tendency to avoid the first and second line moves, but nowhere near
enough to be predictable.

I even played a game on 19x19, and although the opening was kind of
surreal, it shaped up quite well. I overlooked an atari, and it took
advantage of that immediately, saving a group of maybe 20 stones. In the
end I won clearly, but Halgo had three separate live groups on the
board. It took 2.5 hours, and I was glad I had eliminated the silly
endgame moves.


In case you are interested in numbers, here are some:

Cpuinfo says
   vendor_id       : AuthenticAMD
   cpu family      : 15
   model           : 43
   model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
   stepping        : 1
   cpu MHz         : 1000.000
   cache size      : 512 KB
except that MHz will be 2000 when there is load on the machine, as there
is when playing. The system is Linux with Debian/testing. 

GCC says:
gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu
--enable-libstdcxx-debug --enable-mpfr --with-tune=i686
--enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)



Next I guess I will have a look at UCT, it seems to be the hottest topic
at the moment.

- Heikki



-- 
Heikki Levanto   "In Murphy We Turst"     heikki (at) lsd (dot) dk

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Slightly improved MC algorithm

Reply via email to