Re: [computer-go] UCB/UCT and moving targets

dhillismail Thu, 26 Jun 2008 08:20:58 -0700

You can use a windowed average where the window is a fixed fraction (say the 
last third) of the total times the move was made. I have often used an IIR 
filter and have never yet been able to prove that it actually helped. If I 
could write a decent Kalman filter, I would give that a try.


- Dave Hillis


-----Original Message-----
From: Peter Drake <[EMAIL PROTECTED]>
To: computer-go <computer-go@computer-go.org>
Sent: Thu, 26 Jun 2008 11:06 am
Subject: Re: [computer-go] UCB/UCT and moving targets


On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote:?
?
> Quoting Peter Drake <[EMAIL PROTECTED]>:?
>?
>> UCB (and hence UCT) would treat the following sequences of wins >> (1) and?
>> losses (0) the same:?
>>?
>> 01010101010101010101010101010101?
>> 00000000000000001111111111111111?
>> 11111111111111110000000000000000?
>?
> I have two comments. Isn't the problem here that UCT will not > search the 
> second sequence at all because it is so bad initially??
?
Well, UCT never really discards a branch, just samples it less and less often. 
It would eventually go back to sequence 2 and discover that it had become good.?
?
> So will there ever be a situation like this? UCT will more likely > continue 
> to sample the first and third sequenc settling for the > first if their are 
> no change again. And when it finally discovers > that sequene 2 is good it 
> will quickly choose it.?
>?
> The second thing is that in practice you will rarely get any clear > patterns 
> like this so how would you be able to detect any recency??
?
I'm thinking of a situation where move A looks good when we're assuming a 
random response from the opponent, but once the child node starts using UCB, it 
is discovered that the opponent has a very good response, so now runs through A 
start losing. Later, runs through A start winning again because we discover a 
good counterresponse...?
?
> One simple trick is to always replay a move in the tree if it won > the last 
> time the position was visited, and only use UCT for > positions where the 
> last played moved lost. Should'nt this work > like a dream if sequence 2 
> truly goes to 100% winrate directly > after the wirst win is scored??
?
Probably, but we rarely get such pure victory branches. I have a number of 
schemes in mind, though.?
?
Has anyone tried this empirically??
?
Peter Drake?
http://www.lclark.edu/~drake/?
?
?
_______________________________________________?
computer-go mailing list?
[EMAIL PROTECTED]
http://www.computer-go.org/mailman/listinfo/computer-go/?

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] UCB/UCT and moving targets

Reply via email to