Re: [computer-go] UCB/UCT and moving targets

2008-07-14 Thread Łukasz Lew
You might be interested in delta-bar-delta algorithm for adapting the gain size (0.99 in your example) http://www.cs.ualberta.ca/~sutton/papers/sutton-92a.pdf Lukasz Lew On Thu, Jun 26, 2008 at 19:58, Jason House <[EMAIL PROTECTED]> wrote: > I tendto like exponentially weighted moving averages whe

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Jason House
On Jun 26, 2008, at 6:03 PM, [EMAIL PROTECTED] wrote: > -Original Message- > From: Jason House <[EMAIL PROTECTED]> > On Jun 26, 2008, at 3:23 PM, [EMAIL PROTECTED] wrote: Cool! Now for the cases where I'd want a Kalman filter, I'd need it to predict the future state of a non-stationar

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread dhillismail
> -Original Message- > From: Jason House <[EMAIL PROTECTED]> > On Jun 26, 2008, at 3:23 PM, [EMAIL PROTECTED] wrote: Cool! Now for the cases where I'd want a Kalman filter, I'd need it to predict the future state of a non-stationary, multimodal distribution. A typical pattern is for

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Peter Drake
Just what I was looking for -- thanks! On Jun 26, 2008, at 12:04 PM, Rémi Coulom wrote: Peter Drake wrote: Can anyone point me to a thread, or at least some buzzwords? I'm having little luck googling for words like "recent" and "forget". Thanks, Peter Drake http://www.lclark.edu/~drake/ T

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Jason House
is -Original Message- From: Jason House <[EMAIL PROTECTED]> To: computer-go Sent: Thu, 26 Jun 2008 2:00 pm Subject: Re: [computer-go] UCB/UCT and moving targets I probably exceeded my math quota already, but I should add that UCB = w + k*sqrt(P) If k=a*sqrt(log(...)), this becomes

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread dhillismail
nd unworthy of early exploration. -Dave Hillis -Original Message- From: Jason House <[EMAIL PROTECTED]> To: computer-go Sent: Thu, 26 Jun 2008 2:00 pm Subject: Re: [computer-go] UCB/UCT and moving targets I probably exceeded my math quota already, but I should add that ? UCB = w +

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Rémi Coulom
Peter Drake wrote: Can anyone point me to a thread, or at least some buzzwords? I'm having little luck googling for words like "recent" and "forget". Thanks, Peter Drake http://www.lclark.edu/~drake/ Try "discounted UCB": http://computer-go.org/pipermail/computer-go/2007-March/009033.html h

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Jason House
I tendto like exponentially weighted moving averages when I need a fading memory. That keeps storage simple, updates fast, and nearly the same effect i.e. wins = 0.99*wins + latest_result sims = 0.99*sims + 1 Sent from my iPhone On Jun 26, 2008, at 2:40 PM, "Ivan Dubois" <[EMAIL PROTECTED]>

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Peter Drake
Can anyone point me to a thread, or at least some buzzwords? I'm having little luck googling for words like "recent" and "forget". Thanks, Peter Drake http://www.lclark.edu/~drake/ On Jun 26, 2008, at 11:40 AM, Ivan Dubois wrote: This same topic already occured on the list some time ago. I

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Ivan Dubois
This same topic already occured on the list some time ago. I think the idea is to "forget" older results. For exemple you can compute the win rate based only on the last 500 simulations. Older information may not be up to date and will not help much because 500 simulations is enough to compute

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Jason House
I probably exceeded my math quota already, but I should add that UCB = w + k*sqrt(P) If k=a*sqrt(log(...)), this becomes: UCB = w + a*sqrt(log(...)*P) Those looking for a drop into code, the above equation is what you'd want. Note that if P = 0.25/n (from the no drift case), this should

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Jason House
On Thu, Jun 26, 2008 at 11:20 AM, <[EMAIL PROTECTED]> wrote: > You can use a windowed average where the window is a fixed fraction (say > the last third) of the total times the move was made. I have often used an > IIR filter and have never yet been able to prove that it actually helped. If > I co

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Gian-Carlo Pascutto
A van Kessel wrote: 01010101010101010101010101010101 IMHO they are exactly the same and should be as such. At the start of every simulation (before a 0 or 1 is reported) , the situation is (should be) exactly the same. So there i

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread A van Kessel
> 01010101010101010101010101010101 > > IMHO they are exactly the same and should be as such. At the start of every simulation (before a 0 or 1 is reported) , the situation is (should be) exactly the same. So there is no difference w

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread dhillismail
Hillis -Original Message- From: Peter Drake <[EMAIL PROTECTED]> To: computer-go Sent: Thu, 26 Jun 2008 11:06 am Subject: Re: [computer-go] UCB/UCT and moving targets On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote:? ? > Quoting Peter Drake <[EMAIL PROTECTED]>:? >?

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Peter Drake
On Jun 26, 2008, at 1:35 AM, Magnus Persson wrote: Quoting Peter Drake <[EMAIL PROTECTED]>: UCB (and hence UCT) would treat the following sequences of wins (1) and losses (0) the same: 01010101010101010101010101010101 I hav

Re: [computer-go] UCB/UCT and moving targets

2008-06-26 Thread Magnus Persson
Quoting Peter Drake <[EMAIL PROTECTED]>: UCB (and hence UCT) would treat the following sequences of wins (1) and losses (0) the same: 01010101010101010101010101010101 I have two comments. Isn't the problem here that UCT will no