Cool! Now for the cases where I'd want a Kalman filter, I'd need it to predict 
the future state of a non-stationary, multimodal distribution. A typical 
pattern is for a node to start out with optimistic scores but to experience a 
strong pessimistic trend later as UCT starts to focus more on effective enemy 
counter moves. Or the reverse, or sometimes a see-saw. Closer to the root, we 
are?seeing a superposition of many such distributions. My gut feeling is that 
the UCT tree is already doing a better job of what I'm asking the Kalman filter 
to do. But it's a nagging question for me.

Something, that actually has helped me, is to look at the difference of short 
term and long term averages as a diagnostic. For instance, when my program 
plays, I sometimes see the long term average at the root with a high score 
(probability of win), but the short term trend at the end showing a low score. 
When this happens, my program is in big trouble. One common cause is that the 
enemy has a winning sequence, but it involves making a self-atari move, or a 
monkey jump along the edge, or some other move that my progressive widening 
algorithm considers pathological and unworthy of early exploration.

-Dave Hillis

-----Original Message-----
From: Jason House <[EMAIL PROTECTED]>
To: computer-go <computer-go@computer-go.org>
Sent: Thu, 26 Jun 2008 2:00 pm
Subject: Re: [computer-go] UCB/UCT and moving targets



I probably exceeded my math quota already, but I should add that

? UCB = w + k*sqrt(P)




If k=a*sqrt(log(...)), this becomes:

? UCB = w + a*sqrt(log(...)*P)




Those looking for a drop into code, the above equation is what you'd want.




Note that if P = 0.25/n (from the no drift case), this should look nearly 
identical to typical UCT code...

? UCB = w + (a/2)*sqrt(log(...)/n)


On Jun 26, 2008, at 1:40 PM, "Jason House" <[EMAIL PROTECTED]> wrote:






On Thu, Jun 26, 2008 at 11:20 AM, <[EMAIL PROTECTED]> wrote:


You can use a windowed average where the window is a fixed fraction (say the 
last third) of the total times the move was made. I have often used an IIR 
filter and have never yet been able to prove that it actually helped. If I 
could write a decent Kalman filter, I would give that a try.

- Dave Hillis




Ooh, interesting idea!? I couldn't keep myself from working it out for the 
simplest of case of considering a single move in isolation with well-behaved 
noise.? At the end I show that this approach boils down to simple averaging of 
all samples when the drift is assumed to be zero.? I also give equations for 
updates that could be dropped into existing code.

Assuming a model of:
?? sample(n+1) = win_rate(n) + sample_noise
?? win_rate(n+1) = win_rate(n) + drift

sample = result of one monte carlo sim, 0 or 1
For simplicity, assume sample_noise and drift are stationary random processes
Most literature on UCT I see assumes variance(sample_noise) = 0.25

For the equations below, I'm going to use the following short hand:
? n = sample number
? s = sample(n+1)
? q = variance(sample_noise)?? (q for quantization)
? w = (estimated) win rate
? d = variance(drift)
? Other variables internal to the Kalman Filter:
? ?? P = estimation variance 
???? K = gain
? 
That yields the following updates following each sample
? P = P + d
? K = P / (P + q)
? w = w + (s-w)*K
? P = (1-K)*P

If we assume d=0, and assume P has the form q/n before the sample, this becomes:
? w = w + (s-w)/n = (nw+s)/(n+1) = average of all samples
? P = q/(n+1)


= 


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to