On Sun, Mar 17, 2013 at 10:17:30PM +0100, Olivier Teytaud wrote:
> I was surprised many MC programs are not UCT anymore.
> > UCB = (wins / games) + C*sqrt( log(all_games) / games )
> > But in MFG, CS, Pachi and Fuego, C = 0. So they use something like this.
> > UCB_RAVE = (1-beta)*(wins / games) + beta*(rave_wins / rave_games) +
> > somebias.
> >
>
> I think that in many UCTs, the C was so small that it was close to the case
> C=0.
>
> In fact, wins/games is not asymptotically consistent (because a move with
> 0/1 is discarded if another move has a score >0).
> But "(wins+K)/(games+2K)" for any K>0 makes a MCTS consistent. We've worked
> on this in http://hal.inria.fr/inria-00437146/ .
Hi!
I believe this is also equivalent with the "even game prior" described
earlier (maybe even in your paper :)? I think many programs use
something like that.
--
Petr "Pasky" Baudis
For every complex problem there is an answer that is clear,
simple, and wrong. -- H. L. Mencken
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go