Re: [computer-go] Slightly improved MC algorithm

Don Dailey Tue, 27 Feb 2007 14:13:57 -0800

Heikki,

This is very similar to what AnchorMan on CGOS does.  At the end of 
each random simulation I keep the same statistcs on each point of
the board and I use it to improve the move selection algorithm.  I
call this special board an "ownership map" for obvious reasons.  I
just divide each value by the number of simulations run.


It's not clear from your email what you do with this information
however.  AnchorMan uses the ownership map  to give the final 
move choice a little bias.   In my testing the use of an ownership
map improved the program significantly but it was tricky getting
it right.

For reference, AnchorMan does 5,000 simulations per move and is
rated exactly 1500.0.    

AnchorMan is basic MC too, but there are some things about it that
do not match the experiences of others:

  1.  Others have tried to use my idea to bias moves and report
      no improvement.   I think this is because it's easy to get
      this wrong.   I did weeks of experimentation and found that
      if a bias was too strong it actually weakend the program.
      The secret if you are using this to bias the choices is to
      be very conservative and experiment a lot.    

      I bias edge moves, self-atari, captures and moves into 
      heavily owned territory, but these biases are carefully
      applied based on the ownership map.  For example I do not
      bias ALL self-atari moves, just ones that cross a certain
      threshold of enemy ownership.  The bias is not a veto, it's
      a gentle encouragement or discouragement to making certain
      moves under certain situations.

  2.  I use a hybrid form of all-moves-as-first and others have
      reported no improvement.   The behavior I get is that it
      plays much stronger at low simulations and in extensive
      testing I could not find a high enough level for the version
      that does not do all-as-first to make it play as strongly.
      So for me there is no reason whatsoever to stop using 
      all-as-first, even at ridiculous levels. 

      Intiutively, all-as-first should play weaker if you are doing
      enough simulations - but the combination of tricks I'm using
      somehow negates this.   What I do should be called some-as-first
      because I stop tallying moves at about 5/8 of the simulation
      length.  

AnchorMan even beats Lazarus which is about 1800 strength at fast
levels.  For instance Lazarus cannot even begin to compete with 
AnchorMan if it's playing as fast as AnchorMan on CGOS.  
Unfortunately, this type of algorithms is not scalable,  AnchorMan
will never get much better (I think it gets close to 1600 if it's given
something like 200,000 simulations.)

There are some interesting enhancements that can push this a little
farther without getting into the more complex UCT stuff.   I can't
remember who did it, but someone produced something over 1600 doing
basic monte carlo but simulating a 2 ply search - I can't quite 
remember the exact details but it was real simple.   It might have
been Lukasz Lew, I'm not sure.   (I think they kept a 2 dimensional
array of moves and their responses, something like that.)    The
details are in my email somewhere.  

I have an older program (which I cannot find) that was quite 
interesting,  it evolved a playing strategy using PBIL, similar to
some of the original papers on using Monte Carlo with GO.   CGOS
wasn't around then,  but I think that program was pretty strong,
at least as strong as AnchorMan, maybe stronger.    

- Don






On Tue, 2007-02-27 at 21:38 +0100, Heikki Levanto wrote:
> Hi,
> 
> I thought I'd report a small change I have made to the plain MC
> algorithm. I have been unhappy with the fact that the result of the game
> gets pressed into one bit (win/loose), and all other information is
> discarded. This leads to silly endgame moves, since the algorithm sees
> no difference between a 1-point win and a 100-point win. 
> 
> After thinking a bit about this, I decided to keep the whole board image
> at the end. In other words, I have one board-sized array of ints, and at
> the end, when scoring the result of a simulation, I add +1 to every
> point that belongs to black, and -1 to every point that belongs to
> white. Finally I divide this by the number of simulations I have done. 
> 
> When evaluating the simulation, I sum up the points on this board. A
> simple summing didn't work so well, I thought squaring the numbers
> before summing would be better, but in fact it was worse. Now I end up
> with the following:
>    sum += sign(v) * pow(v, exp);
> where exp is something like 0.5
> I did run one player with exp=0.8, and it seemed to perform about
> equally well.
> 
> As a side benefit, I get to know which points clearly belong to black or
> white, and can skip unnecessary endgame moves easily (filling own
> territory to one-point eyes, making silly invasions to enemy territory,
> etc). I believe this could be used for much more clever pruning, but I
> have not (yet?) done so.
> 
> The best of these experiments, Halgo-1.5-500k reached almost 1400 elo on
> cgos, which I think is pretty good for plain MC without any trickery.
> Here my notes of the versions I had there:
> 
>   Halgo1  - 1250 - the all first one 
>   Halgo-1.0-500k -  477 - Traditional MC. Something wrong with it?
>   Halgo-1.1-500k - 1280 - Simple MC with area evaluation
>   Halgo-1.2-500k - 1053 - Same as 1.1, but with a non-linear scoring function 
> (s*v*v)
>   Halgo-1.3-500k - 1400 - Same as 1.1, but with pow(v,0.8) as the summing 
> function.
>   Halgo-1.4-500k -  492 - Same as 1.1, but with plain sign(v) as the summing 
> function.
>   Halgo-1.5-500k - 1390 - Going for the pow function again, with 0.5. About 
> same as 1.3
> 
> I am not sure if there was something wrong with halgo-1.0, it was
> supposed to play 'traditional' MC, summing just the winner of the games.
> 
> My code is heavily based on the Efficient Go Library by Lukas Lew.
> Actually, my own code is only some 360 lines, warts and all. I have put
> a tarball with Lew's library and my code at
>    http://www.lsd.dk/heikki/halgo-1.5.tgz
> if you want to look at it. Being derived from a GPL work, it is under
> GPL itself too, of course.
> 
> The 500k refers to the number of simulated random games from the given
> position, equally divided among all legal moves that don't fill
> one-point eyes. 500k sounds like a large number but the games typically
> use less than 5 minutes of their clock, so are quite ok to play on cgos.
> This is on my dual-core AMD desktop (using only one core).
> 
> I still suspect that 500k is nowhere enough to get reliabe results, at
> least from the opening position, as the program wants to start at all
> sort of places. It has some preference to the centerpoint (on 9x9), and
> some tendency to avoid the first and second line moves, but nowhere near
> enough to be predictable.
> 
> I even played a game on 19x19, and although the opening was kind of
> surreal, it shaped up quite well. I overlooked an atari, and it took
> advantage of that immediately, saving a group of maybe 20 stones. In the
> end I won clearly, but Halgo had three separate live groups on the
> board. It took 2.5 hours, and I was glad I had eliminated the silly
> endgame moves.
> 
> 
> In case you are interested in numbers, here are some:
> 
> Cpuinfo says
>    vendor_id       : AuthenticAMD
>    cpu family      : 15
>    model           : 43
>    model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
>    stepping        : 1
>    cpu MHz         : 1000.000
>    cache size      : 512 KB
> except that MHz will be 2000 when there is load on the machine, as there
> is when playing. The system is Linux with Debian/testing. 
> 
> GCC says:
> gcc -v
> Using built-in specs.
> Target: i486-linux-gnu
> Configured with: ../src/configure -v
> --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
> --enable-shared --with-system-zlib --libexecdir=/usr/lib
> --without-included-gettext --enable-threads=posix --enable-nls
> --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu
> --enable-libstdcxx-debug --enable-mpfr --with-tune=i686
> --enable-checking=release i486-linux-gnu
> Thread model: posix
> gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
> 
> 
> 
> Next I guess I will have a look at UCT, it seems to be the hottest topic
> at the moment.
> 
> - Heikki
> 
> 
> 

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Slightly improved MC algorithm

Reply via email to