Re: [computer-go] How to design the stronger playout policy?

Yamato Sat, 05 Jan 2008 04:31:41 -0800

Gian-Carlo Pascutto wrote:
>What improvements did you try? The obvious one I know are prioritizing 
>saving and capturing moves by the size of the string.
>
>Zen appears quite strong on CGOS. Leela using the above system was 
>certainly weaker.


I use the static ladder search in playouts. For example, if a move that
matched a 3x3 pattern is capturable in ladder, that is not interesting.
Of course such a rule makes a program slower, but I believe it is an
improvement.

>I finally improved my playouts by using Remi's ELO system to learn a set 
>of "interesting" patterns, and just randomly fiddling with the 
>probabilities (compressing/expanding) until something improved my 
>program in self-play with about +25%. Not a very satisfying method or an 
>exceptional result. There could be some other magic combination that is 
>even better, or maybe not.

I also have implemented Remi's Minorization-Maximization algorithm.
But I could not find how to use the result of it to improve the strength.
Would you explain the details of the playout policy?
Do you use only 3x3 patterns?

>What is so frustrating is that the playouts are essentially black magic. 
>   I know of no way to automatically determine what is good and not 
>besides playing about 500 games between 2 strategies. The results are 
>very often completely counterintuitive. There is no systematic way to 
>improve.

Yes. In addition, the big problem is that testing policies is very time
consuming. I think at least 1000 games that use 3000 or more playouts
per move are needed to judge whether a change is good or bad.

--
Yamato
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] How to design the stronger playout policy?

Reply via email to