Quoting Peter Drake <[EMAIL PROTECTED]>:
UCB (and hence UCT) would treat the following sequences of wins (1) and
losses (0) the same:
01010101010101010101010101010101
00000000000000001111111111111111
11111111111111110000000000000000
I have two comments. Isn't the problem here that UCT will not search
the second sequence at all because it is so bad initially? So will
there ever be a situation like this? UCT will more likely continue to
sample the first and third sequenc settling for the first if their are
no change again. And when it finally discovers that sequene 2 is good
it will quickly choose it.
The second thing is that in practice you will rarely get any clear
patterns like this so how would you be able to detect any recency?
One simple trick is to always replay a move in the tree if it won the
last time the position was visited, and only use UCT for positions
where the last played moved lost. Should'nt this work like a dream if
sequence 2 truly goes to 100% winrate directly after the wirst win is
scored?
-Magnus
--
Magnus Persson
Berlin, Germany
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/