Sent from my iPhone
On Jun 26, 2008, at 3:23 PM, [EMAIL PROTECTED] wrote:
Cool! Now for the cases where I'd want a Kalman filter, I'd need it
to predict the future state of a non-stationary, multimodal
distribution. A typical pattern is for a node to start out with
optimistic scores but to experience a strong pessimistic trend later
as UCT starts to focus more on effective enemy counter moves. Or the
reverse, or sometimes a see-saw.
The drift term is intended to take care of that kind of behavior.
Closer to the root, we are seeing a superposition of many such
distributions. My gut feeling is that the UCT tree is already doing
a better job of what I'm asking the Kalman filter to do. But it's a
nagging question for me.
There's no way for a single floating point value to capture deeper
structure.
Something, that actually has helped me, is to look at the difference
of short term and long term averages as a diagnostic. For instance,
when my program plays, I sometimes see the long term average at the
root with a high score (probability of win), but the short term
trend at the end showing a low score. When this happens, my program
is in big trouble. One common cause is that the enemy has a winning
sequence, but it involves making a self-atari move, or a monkey jump
along the edge, or some other move that my progressive widening
algorithm considers pathological and unworthy of early exploration.
-Dave Hillis
-----Original Message-----
From: Jason House <[EMAIL PROTECTED]>
To: computer-go <computer-go@computer-go.org>
Sent: Thu, 26 Jun 2008 2:00 pm
Subject: Re: [computer-go] UCB/UCT and moving targets
I probably exceeded my math quota already, but I should add that
UCB = w + k*sqrt(P)
If k=a*sqrt(log(...)), this becomes:
UCB = w + a*sqrt(log(...)*P)
Those looking for a drop into code, the above equation is what you'd
want.
Note that if P = 0.25/n (from the no drift case), this should look
nearly identical to typical UCT code...
UCB = w + (a/2)*sqrt(log(...)/n)
On Jun 26, 2008, at 1:40 PM, "Jason House" <[EMAIL PROTECTED]
> wrote:
On Thu, Jun 26, 2008 at 11:20 AM, <[EMAIL PROTECTED]> wrote:
You can use a windowed average where the window is a fixed fraction
(say the last third) of the total times the move was made. I have
often used an IIR filter and have never yet been able to prove that
it actually helped. If I could write a decent Kalman filter, I
would give that a try.
- Dave Hillis
Ooh, interesting idea! I couldn't keep myself from working it out
for the simplest of case of considering a single move in isolation
with well-behaved noise. At the end I show that this approach
boils down to simple averaging of all samples when the drift is
assumed to be zero. I also give equations for updates that could
be dropped into existing code.
Assuming a model of:
sample(n+1) = win_rate(n) + sample_noise
win_rate(n+1) = win_rate(n) + drift
sample = result of one monte carlo sim, 0 or 1
For simplicity, assume sample_noise and drift are stationary random
processes
Most literature on UCT I see assumes variance(sample_noise) = 0.25
For the equations below, I'm going to use the following short hand:
n = sample number
s = sample(n+1)
q = variance(sample_noise) (q for quantization)
w = (estimated) win rate
d = variance(drift)
Other variables internal to the Kalman Filter:
P = estimation variance
K = gain
That yields the following updates following each sample
P = P + d
K = P / (P + q)
w = w + (s-w)*K
P = (1-K)*P
If we assume d=0, and assume P has the form q/n before the sample,
this becomes:
w = w + (s-w)/n = (nw+s)/(n+1) = average of all samples
P = q/(n+1)
=
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
Get the Moviefone Toolbar. Showtimes, theaters, movie news, & more!
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/