I tried this yesterday with K=10 and it seemed to make Many Faces weaker (84.2% +- 2.3 vs 81.6% +-1.7), not 95% confidence, but likely weaker. This is 19x19 vs gnugo with Many Faces using 8K playouts per move, 1000 games without and 2000 games with the change. I have the UCT exploration term, so perhaps with exploration this idea doesn't work. Or perhaps the K I tried is too large.
David From: computer-go-boun...@computer-go.org [mailto:computer-go-boun...@computer-go.org] On Behalf Of Olivier Teytaud Sent: Saturday, October 03, 2009 1:28 AM To: computer-go Subject: Re: [SPAM] Re: [SPAM] Re: [computer-go] Progressive widening vs unpruning 4) regularized success rate (nbWins +K ) /(nbSims + 2K) (the original "progressive bias" is simpler than that) I'm not sure what you mean here. Can you explain a bit more? Sorry for being unclear, I hope I'll do better below. Instead of just "number of wins" divided by "numer of simulations", we use "nb of wins + K" divided by "nb of simulations + 2K"; this is similar to the "even game" heuristic previously cited; it avoids that we 0% of success rate for a move tested just once. If you apply UCT with constant zero in front of the "sqrt{log(N)/N_i)" term, then such a regularization is necessary for showing consistency of UCT for two-player games; and even with non-zero "exploration terms", I guess this kind of regularization avoids that the program spends a very long time without looking at a move just because of a few bad first simulations. This kind of detail is a bit boring, but I think K>0 is much better in many cases... well, maybe not for other implementations, depending on the other terms you have - our formula is so long now I'm not able of writing it in closed form :-) By the way, K>0 is in my humble opinion a very good idea if you want to check that UCT with positive constant has a good effect in your code - I feel that UCT is great if K=0, just because of the "bad first simulation effect" - with K=0 and without exploration term, just loosing the first few simulations can lead to the very bad situation in which a move is never tested anymore. Best regards, Olivier
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/