I tried this yesterday with K=10 and it seemed to make Many Faces weaker
(84.2% +- 2.3 vs 81.6% +-1.7), not 95% confidence, but likely weaker.  This
is 19x19 vs gnugo with Many Faces using 8K playouts per move, 1000 games
without and 2000 games with the change.  I have the UCT exploration term, so
perhaps with exploration this idea doesn't work.  Or perhaps the K I tried
is too large.

 

David

 

From: computer-go-boun...@computer-go.org
[mailto:computer-go-boun...@computer-go.org] On Behalf Of Olivier Teytaud
Sent: Saturday, October 03, 2009 1:28 AM
To: computer-go
Subject: Re: [SPAM] Re: [SPAM] Re: [computer-go] Progressive widening vs
unpruning

 

 


4) regularized success rate (nbWins +K ) /(nbSims + 2K)
(the original "progressive bias" is simpler than that)

 

I'm not sure what you mean here. Can you explain a bit more?

 


Sorry for being unclear, I hope I'll do better below.

Instead of just "number of wins" divided by "numer of simulations",
we use "nb of wins + K" divided by "nb of simulations + 2K";
this is similar to the "even game" heuristic previously cited;
it avoids that we 0% of success rate for a move tested just once.

If you apply UCT with constant zero in front of the "sqrt{log(N)/N_i)"
term, then such a regularization is necessary for showing consistency of UCT
for two-player games; and even with non-zero "exploration terms", I guess
this kind of regularization avoids that the program spends a very long time
without looking at a move just because of a few bad first simulations. This
kind of detail is a bit boring, but I think K>0 is much better in many
cases... well, maybe not for other implementations, depending on the other
terms you have - our formula is so long now I'm not able of writing it in
closed form :-)
By the way, K>0 is in my humble opinion a very good idea if you want to
check that UCT with positive constant has a good effect in your code - I
feel that UCT is great if K=0, just because of the "bad first simulation
effect" - with K=0 and without exploration term, just loosing the first few
simulations can lead to the very bad situation in which a move is never
tested anymore.

Best regards,
Olivier

 

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to