Re: [computer-go] Tweak to MCTS selection criterion

Don Dailey Sat, 06 Jun 2009 14:31:28 -0700

On Sat, Jun 6, 2009 at 5:07 PM, Michael Williams <
michaelwilliam...@gmail.com> wrote:


> Another strategy to be considered is to not allow the thinking to cease
> until the maximum win rate and the maximum visit count agree on the same
> move. Obviously this requires some extra code to make sure you don't lose on
> time, etc.


I like the pebbles idea, but this is probably preferred and the pebbles rule
is the one to use if time does not permit the extra work.

Allocating extra thinking time is very wise in this situation because the
situation is one that clearly needs to be resolved - a move that previously
did not look good has emerged as potentially best, and perhaps the current
move with most samples has dropped in score enough to trigger this event .
This could be due to trouble on the horizon.

- Don




>
>
> Brian Sheppard wrote:
>
>> When a UCT search is completed, the usual selection criterion is
>> "choose the move that has the most trials." This is more stable
>> than choosing the move that has the highest percentage of wins,
>> since it is possible to have an unreliably high percentage if the
>> number of trials is small.
>>
>> I have a small tweak to that criterion. Pebbles uses "choose the
>> move that has the most wins." This rule selects the same move as
>> the conventional criterion in almost every case. The reason why
>> Pebbles' rule is superior is revealed in the case where the moves
>> differ.
>>
>> When Pebbles chooses a different move than the conventional criterion,
>> it is because Pebbles move has more wins in fewer trials. When that
>> happens, Pebbles move would inevitably become the move with the most
>> trials if searching were to continue. So there is actually no downside.
>> Of course, the upside is minor, too.
>>
>> For validation, Pebbles has been using both strategies on CGOS games.
>> At present, the conventional selection strategy has won 341/498 = 68.47%.
>> Pebbles strategy has won 415/583 = 71.18%. This isn't statistically
>> conclusive or anything (0.7 standard deviations; we would need 4 to 8
>> times as many trials for strong statistical evidence). But Pebbles'
>> strategy should be better by a small amount, and it has been, so I
>> present it to you with confidence.
>>
>> Best,
>> Brian
>>
>> _______________________________________________
>> computer-go mailing list
>> computer-go@computer-go.org
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>>
>>
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Tweak to MCTS selection criterion

Reply via email to