You might be interested in delta-bar-delta algorithm for adapting the
gain size (0.99 in your example)
http://www.cs.ualberta.ca/~sutton/papers/sutton-92a.pdf
Lukasz Lew
On Thu, Jun 26, 2008 at 19:58, Jason House <[EMAIL PROTECTED]> wrote:
> I tendto like exponentially weighted moving averages whe
On Jun 26, 2008, at 6:03 PM, [EMAIL PROTECTED] wrote:
> -Original Message-
> From: Jason House <[EMAIL PROTECTED]>
> On Jun 26, 2008, at 3:23 PM, [EMAIL PROTECTED] wrote:
Cool! Now for the cases where I'd want a Kalman filter, I'd need it
to predict the future state of a non-stationar
> -Original Message-
> From: Jason House <[EMAIL PROTECTED]>
> On Jun 26, 2008, at 3:23 PM, [EMAIL PROTECTED] wrote:
Cool! Now for the cases where I'd want a Kalman filter, I'd need it to predict
the future state of a non-stationary, multimodal distribution. A typical
pattern is for
Just what I was looking for -- thanks!
On Jun 26, 2008, at 12:04 PM, Rémi Coulom wrote:
Peter Drake wrote:
Can anyone point me to a thread, or at least some buzzwords?
I'm having little luck googling for words like "recent" and "forget".
Thanks,
Peter Drake
http://www.lclark.edu/~drake/
T
is
-Original Message-
From: Jason House <[EMAIL PROTECTED]>
To: computer-go
Sent: Thu, 26 Jun 2008 2:00 pm
Subject: Re: [computer-go] UCB/UCT and moving targets
I probably exceeded my math quota already, but I should add that
UCB = w + k*sqrt(P)
If k=a*sqrt(log(...)), this becomes
nd unworthy of early exploration.
-Dave Hillis
-Original Message-
From: Jason House <[EMAIL PROTECTED]>
To: computer-go
Sent: Thu, 26 Jun 2008 2:00 pm
Subject: Re: [computer-go] UCB/UCT and moving targets
I probably exceeded my math quota already, but I should add that
? UCB = w +
Peter Drake wrote:
Can anyone point me to a thread, or at least some buzzwords?
I'm having little luck googling for words like "recent" and "forget".
Thanks,
Peter Drake
http://www.lclark.edu/~drake/
Try "discounted UCB":
http://computer-go.org/pipermail/computer-go/2007-March/009033.html
h
I tendto like exponentially weighted moving averages when I need a
fading memory. That keeps storage simple, updates fast, and nearly the
same effect
i.e.
wins = 0.99*wins + latest_result
sims = 0.99*sims + 1
Sent from my iPhone
On Jun 26, 2008, at 2:40 PM, "Ivan Dubois" <[EMAIL PROTECTED]>
Can anyone point me to a thread, or at least some buzzwords?
I'm having little luck googling for words like "recent" and "forget".
Thanks,
Peter Drake
http://www.lclark.edu/~drake/
On Jun 26, 2008, at 11:40 AM, Ivan Dubois wrote:
This same topic already occured on the list some time ago.
I
This same topic already occured on the list some time ago.
I think the idea is to "forget" older results. For exemple you can compute
the win rate based only on the last 500 simulations. Older information may
not be up to date and will not help much because 500 simulations is enough
to compute
I probably exceeded my math quota already, but I should add that
UCB = w + k*sqrt(P)
If k=a*sqrt(log(...)), this becomes:
UCB = w + a*sqrt(log(...)*P)
Those looking for a drop into code, the above equation is what you'd
want.
Note that if P = 0.25/n (from the no drift case), this should
On Thu, Jun 26, 2008 at 11:20 AM, <[EMAIL PROTECTED]> wrote:
> You can use a windowed average where the window is a fixed fraction (say
> the last third) of the total times the move was made. I have often used an
> IIR filter and have never yet been able to prove that it actually helped. If
> I co
A van Kessel wrote:
01010101010101010101010101010101
IMHO they are exactly the same and should be as such.
At the start of every simulation (before a 0 or 1 is reported)
, the situation is (should be) exactly the same.
So there i
> 01010101010101010101010101010101
>
>
IMHO they are exactly the same and should be as such.
At the start of every simulation (before a 0 or 1 is reported)
, the situation is (should be) exactly the same.
So there is no difference w
Hillis
-Original Message-
From: Peter Drake <[EMAIL PROTECTED]>
To: computer-go
Sent: Thu, 26 Jun 2008 11:06 am
Subject: Re: [computer-go] UCB/UCT and moving targets
On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote:?
?
> Quoting Peter Drake <[EMAIL PROTECTED]>:?
>?
On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote:
Quoting Peter Drake <[EMAIL PROTECTED]>:
UCB (and hence UCT) would treat the following sequences of wins
(1) and
losses (0) the same:
01010101010101010101010101010101
I hav
Quoting Peter Drake <[EMAIL PROTECTED]>:
UCB (and hence UCT) would treat the following sequences of wins (1) and
losses (0) the same:
01010101010101010101010101010101
I have two comments. Isn't the problem here that UCT will no
17 matches
Mail list logo