I probably exceeded my math quota already, but I should add that
UCB = w + k*sqrt(P)
If k=a*sqrt(log(...)), this becomes:
UCB = w + a*sqrt(log(...)*P)
Those looking for a drop into code, the above equation is what you'd
want.
Note that if P = 0.25/n (from the no drift case), this should look
nearly identical to typical UCT code...
UCB = w + (a/2)*sqrt(log(...)/n)
On Jun 26, 2008, at 1:40 PM, "Jason House"
<[EMAIL PROTECTED]> wrote:
On Thu, Jun 26, 2008 at 11:20 AM, <[EMAIL PROTECTED]> wrote:
You can use a windowed average where the window is a fixed fraction
(say the last third) of the total times the move was made. I have
often used an IIR filter and have never yet been able to prove that
it actually helped. If I could write a decent Kalman filter, I would
give that a try.
- Dave Hillis
Ooh, interesting idea! I couldn't keep myself from working it out
for the simplest of case of considering a single move in isolation
with well-behaved noise. At the end I show that this approach boils
down to simple averaging of all samples when the drift is assumed to
be zero. I also give equations for updates that could be dropped
into existing code.
Assuming a model of:
sample(n+1) = win_rate(n) + sample_noise
win_rate(n+1) = win_rate(n) + drift
sample = result of one monte carlo sim, 0 or 1
For simplicity, assume sample_noise and drift are stationary random
processes
Most literature on UCT I see assumes variance(sample_noise) = 0.25
For the equations below, I'm going to use the following short hand:
n = sample number
s = sample(n+1)
q = variance(sample_noise) (q for quantization)
w = (estimated) win rate
d = variance(drift)
Other variables internal to the Kalman Filter:
P = estimation variance
K = gain
That yields the following updates following each sample
P = P + d
K = P / (P + q)
w = w + (s-w)*K
P = (1-K)*P
If we assume d=0, and assume P has the form q/n before the sample,
this becomes:
w = w + (s-w)/n = (nw+s)/(n+1) = average of all samples
P = q/(n+1)
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/