Re: [computer-go] How to design the stronger playout policy?

Don Dailey Sat, 05 Jan 2008 05:24:39 -0800

Lazarus uses a system very simlar to the original MoGo policy as
documented in the paper.   However I did find one significant
improvement.    I used Rémi's ELO system to rate patterns and I simply
throw out moves which match the weakest patterns in the play-outs.    In
the tree,  I also throw out these moves but this is progressive.   
Those moves still get explored but not near leaf nodes.

It seems like I also found a minor improvement in giving captures a
higher priority at some point but I don't remember the details.  

I have not put a huge amount of energy into anything other than this.

Programs that play games tend to be highly idiosyncratic.   What may
work today may not work tomorrow or may work for me and not you.    It
depends on what is already in your program, what isn't, how you
implemented it, etc.    Some improvements work well with others and some
do not cooperate with other changes.     The whole process is a bit of a
black art,  not just the play-outs.

The very biggest problem of all is how to test a change.   It's not so
difficult in the early stages where you are testing major improvements
and getting 100 ELO at a time.    But when you get the point where you
refine a program,   you are talking about small but important
improvements.    You are looking for 10 or 20 ELO and hoping to put
several of them together to get 100 or 200.     But testing even 20 ELO
points takes thousands of games to get a reliable measure.      If we
could measure 5 ELO improvements in 5 minutes,  you can bet most of us
would be able to produce very strong programs.

Some of the top computer chess guys have testing systems that play
50,000 games or more to measure the value of small improvements, such as
a weight adjustment.   They play those games on several computers (or
cpu's)  and play relatively fast - I have heard of testing systems that
average a game per second per cpu at reasonably strong levels (they are
still playing at least master strength.)  

The problem is that if you test too fast, you are not really stressing a
monte carlo program or testing the sub-systems in the same way they
would be tested in real games.     Monte Carlo relies on statistics
gathering and that seems to require games that last at least a few
seconds.     So unless you have a large bank of workstations it's
difficult to get enough game to reliable measure small improvements -
since this requires tens of thousands of games.

- Don

Gian-Carlo Pascutto wrote:
> Yamato wrote:
>> I guess the current top programs have much better playout policy than
>> the classical MoGo-style one.
>>
>> The original policy of MoGo was,
>>
>> (1) If the last move is an Atari, plays one saving move randomly.
>> (2) If there are "interesting" moves in the 8 positions around the
>>     last move, plays one randomly.
>> (3) If there are the moves capturing stones, plays one randomly.
>> (4) Plays one random move on the board.
>>
>> I (and maybe many others) use it with some improvements, however it
>> will be not enough to catch up the top programs.
>
> What improvements did you try? The obvious one I know are prioritizing
> saving and capturing moves by the size of the string.
>
> Zen appears quite strong on CGOS. Leela using the above system was
> certainly weaker.
>
>> Then I have tested a lot of change of probability distributions, but
>> it was very hard to improve the strength.
>>
>> Any comments?
>
> I had the same problem, i.e. it seems almost impossible to improve the
> MoGo system by having a different pattern set for "interesting" moves,
> or even by varying the probability of "interesting" moves by pattern
> score.
>
> I tried 2 things:
>
> a) I exctracted about 5000 positions with a known winner (determined
> by UCT) from CGOS games, and measured the Mean Square Error of the
> result fof my playouts against the known result (also described in one
> of the MoGo papers). Then I applied a genetic algorithm to optimize
> the playout patterns.
>
> This worked, in the sense that the MSE measured over the 5000
> positions dropped. However, it did not produce a stronger program! I
> found that somewhat shocking.
>
> It makes me doubt the value of the MSE measure.
>
> 2) I made a simple genetic algorithm that makes a random pool of a few
> hundred playout policites, picks 2 random parents and
> crossovers/mutates to 2 children, plays a 10 game match between the 2
> children with simulations = 100, and then keeps the winner.
>
> This did not produce anything interesting either. My best guess is
> that the match results are simply too random.
>
> So I did not found any way to automatically optimize the patterns.
>
> I finally improved my playouts by using Remi's ELO system to learn a
> set of "interesting" patterns, and just randomly fiddling with the
> probabilities (compressing/expanding) until something improved my
> program in self-play with about +25%. Not a very satisfying method or
> an exceptional result. There could be some other magic combination
> that is even better, or maybe not.
>
> I now got some improvement by "merging" the (1) (2) (3) in the MoGo
> system and using probabilities on that. It makes sense because the
> playouts wont try hopeless saving moves, for example.
>
> What is so frustrating is that the playouts are essentially black
> magic.   I know of no way to automatically determine what is good and
> not besides playing about 500 games between 2 strategies. The results
> are very often completely counterintuitive. There is no systematic way
> to improve.
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

Reply via email to