My original example was unrealistic and on the extreme side to make a point.  
However if there are nodes with say 7/10, 12/20, and 50/100 how should they be 
ranked?  In some sense, the first one seems promising since we've only searched 
just a few nodes, yet we are mainly seeing wins (granted, the last one 
expresses more confidence).  To me, this would seem like an argument that 
favors win rate over win count (and we don't always have the benefit of 
extending the search for these "low confidence nodes" until the law of large 
numbers sorts things out).

Perhaps there is an argument that the UCB formula "won't generally let this 
happen" since it takes into consideration both win rate and tries to increase 
confidence by promoting the visit of nodes with low visit counts.  Still, I can 
envision cases where time is running out, and UCT has just recently discovered 
a promising new branch.

If the choice were 1/2 vs. 50/100, the second expresses more confidence yet one 
could argue we should search the first move more (given that time is available 
to do so) to find out if it is, in fact, better than .5 (since being low 
confidence, it has a reasonable chance of being better than the 2nd).  So the 
general question becomes how to effectively trade off win rate with win rate 
confidence?

FWIW, the sample code at
http://mcts.ai/code/python.html

Does something completely different, it selects the root node which has the 
most number of visits:

"return sorted(rootnode.childNodes, key = lambda c: c.visits)[-1].move # return 
the move that was most visited"

 
 On Thu, Jul 03, 2014 at 10:57:17AM +0200, Stefan Kaitschick
 wrote:
 > On Thu, Jul 3, 2014 at 9:00 AM, Darren Cook <[email protected]>
 wrote:
 > 
 > >
 > > If you had a choice between a 1% 65,000-wins move
 and a 70% 7-wins move,
 > > MCTS will keep exploring the 70% move, until it
 either reaches 65,001
 > > wins, and can be chosen, or the winning percentage
 comes down to 1% also.
 > >
 > > BTW, that implies it would be very difficult to
 ever reach the situation
 > > you describe, as 1% win rate moves wouldn't be
 given 650,000 trials
 > > (unless all other moves on the board are equally
 bad, i.e. the game is
 > > clearly lost).
 > >
 > 
 > What I dont understand, is why the variation that's
 trying to catch up has
 > to absolutely overtake the leader.
 > Shouldn't there be a substantial bonus for a late high
 success rate?
 
 The way this bonus is often implemented is that if a
 disparity between
 the move with the highest winrate and the move with the
 highest number
 of simulations is different, the program spends some extra
 time
 searching to give a chance to resolve this.
 
   (Conversely, if some move has been simulated so much
 more that no
 other move can overtake it in the number of simulations in
 the alloted
 time anymore, the search can be stopped early.)
 
 P.S.: No, I don't like Japanese byoyomi for Pachi. ;-)
 
 -- 
            
     Petr Baudis
     Life is short, the craft long,
 opportunity fleeting, experiment
     treacherous, judgment difficult.  --
 Hippocrates
 
 
 ------------------------------
 
 _______________________________________________
 Computer-go mailing list
 [email protected]
 http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
 
 End of Computer-go Digest, Vol 54, Issue 4
 ******************************************
 
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to