Quoting Peter Drake <[EMAIL PROTECTED]>:

UCB (and hence UCT) would treat the following sequences of wins (1) and
losses (0) the same:

01010101010101010101010101010101
00000000000000001111111111111111
11111111111111110000000000000000

I have two comments. Isn't the problem here that UCT will not search the second sequence at all because it is so bad initially? So will there ever be a situation like this? UCT will more likely continue to sample the first and third sequenc settling for the first if their are no change again. And when it finally discovers that sequene 2 is good it will quickly choose it.

The second thing is that in practice you will rarely get any clear patterns like this so how would you be able to detect any recency?

One simple trick is to always replay a move in the tree if it won the last time the position was visited, and only use UCT for positions where the last played moved lost. Should'nt this work like a dream if sequence 2 truly goes to 100% winrate directly after the wirst win is scored?

-Magnus

--
Magnus Persson
Berlin, Germany
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to