On Thu, Jun 26, 2008 at 11:20 AM, <[EMAIL PROTECTED]> wrote:

> You can use a windowed average where the window is a fixed fraction (say
> the last third) of the total times the move was made. I have often used an
> IIR filter and have never yet been able to prove that it actually helped. If
> I could write a decent Kalman filter, I would give that a try.
>
> - Dave Hillis


Ooh, interesting idea!  I couldn't keep myself from working it out for the
simplest of case of considering a single move in isolation with well-behaved
noise.  At the end I show that this approach boils down to simple averaging
of all samples when the drift is assumed to be zero.  I also give equations
for updates that could be dropped into existing code.

Assuming a model of:
   sample(n+1) = win_rate(n) + sample_noise
   win_rate(n+1) = win_rate(n) + drift

sample = result of one monte carlo sim, 0 or 1
For simplicity, assume sample_noise and drift are stationary random
processes
Most literature on UCT I see assumes variance(sample_noise) = 0.25

For the equations below, I'm going to use the following short hand:
  n = sample number
  s = sample(n+1)
  q = variance(sample_noise)   (q for quantization)
  w = (estimated) win rate
  d = variance(drift)
  Other variables internal to the Kalman Filter:
     P = estimation variance
     K = gain

That yields the following updates following each sample
  P = P + d
  K = P / (P + q)
  w = w + (s-w)*K
  P = (1-K)*P

If we assume d=0, and assume P has the form q/n before the sample, this
becomes:
  w = w + (s-w)/n = (nw+s)/(n+1) = average of all samples
  P = q/(n+1)
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to