Re: [computer-go] UCB/UCT and moving targets

Magnus Persson Thu, 26 Jun 2008 01:36:28 -0700

Quoting Peter Drake <[EMAIL PROTECTED]>:

UCB (and hence UCT) would treat the following sequences of wins (1) and
losses (0) the same:


01010101010101010101010101010101
00000000000000001111111111111111
11111111111111110000000000000000

I have two comments. Isn't the problem here that UCT will not searchthe second sequence at all because it is so bad initially? So willthere ever be a situation like this? UCT will more likely continue tosample the first and third sequenc settling for the first if their areno change again. And when it finally discovers that sequene 2 is goodit will quickly choose it.

The second thing is that in practice you will rarely get any clearpatterns like this so how would you be able to detect any recency?

One simple trick is to always replay a move in the tree if it won thelast time the position was visited, and only use UCT for positionswhere the last played moved lost. Should'nt this work like a dream ifsequence 2 truly goes to 100% winrate directly after the wirst win isscored?


-Magnus

--
Magnus Persson
Berlin, Germany
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] UCB/UCT and moving targets

Reply via email to