On Sun, 2008-11-30 at 14:49 -0200, Mark Boon wrote:
> Indeed, the scaling question is very important. Even though I think I  
> have AMAF/ RAVE working now, it's still not so clear-cut what it's  
> worth. With just 2,000 playouts I'm seeing a 88% win-rate against  
> plain old UCT tree-search without RAVE. At 10,000 playouts this win- 
> rate drops to 75%. With 50,000 to 69%. All of these results have a  
> margin of error of a few points, but the trend is obvious. UCT plays  
> weaker than UCT+RAVE but it scales a little better. 

Not necessarily.  Start with the UCT and UCT+RAVE and find a level where
they are equivalent in strength.  Whatever point this is, the UCT
version will take much longer to search.      Now, test a version with
some constant factor more playouts (or time.)   For instance test a
version of each that does 3X more work.   Which is stronger now?  That
is how you tell if it scales better or not.   You must start from the
same point ELO-wise.  

My idea is that if it doesn't scale,  there may be some fundamental
improvement still waiting to be found.    In the worst case you can
gradually phase it out. 

Of course as you say they may converge, perhaps in som asymptotic way so
that at any given level of effort, the RAVE enhancement always helps,
but less and less so.

- Don


> This doesn't  
> necessarily mean they converge. From the few data-points that I have  
> it looks like UCT+RAVE might converge to a winning rate of 66%  
> against plain UCT search with playouts in the hundred-thousands or  
> millions. Is that about 100 ELO points? That in itself would be  
> justification enough to keep it. But there's a computation-cost as  
> well. Plus, as soon as you start to introduce other move-selection  
> procedures it may eat into the gain RAVE provides even further.
> 
> Anyhow, the way I have it set now I can easily switch between using  
> AMAF information to compute RAVE or not. There are also still some  
> parameters to tune. So this is not the last word on it by far. It's  
> more like a good starting point. Also, even if it's not something to  
> use in a final playing-engine, it's good to have a 'base-line' that  
> provides the best possible time/strength combination to run quick  
> tests against.
> 
> Is there actually a well-understood basis for the diminishing return  
> of UCT+RAVE vs. UCT? I have given it a little thought, but it's not  
> entirely obvious to me why UCT+RAVE wouldn't scale better than what  
> I'm seeing.
> 
> I've also run into a few 'fluke' results. Winning streaks of a dozen  
> games in a row (or more) happen between equally strong programs. So  
> to be reasonably sure I'd like to get about 1,000 games. If you want  
> to make sure two implementations are equivalent, like in case of the  
> ref-bots, I'd recommend 10,000 games.
> 
> If all I want to know is whether something is an improvement or not,  
> then I usually settle for fewer games. If after a (few) hundred games  
> I see a win-rate of 50% or less I decide it's not an improvement (not  
> one worth anything anyway), if I see a winning-rate of around 60% or  
> more I keep it. Anything in between I might decide to let it run a  
> bit more. The improvements that I keep I run with longer thinking  
> times overnight to see if they scale. After all, the only real test  
> worth anything is under realistic playing circumstances.
> 
> Mark
> 
> On 29-nov-08, at 11:32, Don Dailey wrote:
> 
> > On Sat, 2008-11-29 at 11:58 +0100, Denis fidaali wrote:
> >>
> >>  From my own experience, an important testing case whenever trying
> >> to get AMAF to work, is the scaling study.
> >>
> >
> > No truer words ever spoken.  This is one of the secrets to strong
> > programs, if they scale, they are probably soundly designed.  I do  
> > that
> > with Chess.  I find that some program changes scale up, particular  
> > sound
> > algorithms that reduce the branching factor.  I have to run tests  
> > pretty
> > fast in order to get results I can interpret, but I also plot the  
> > result
> > visually with gnuplot.
> >
> > As many here will recall,  my own Fatman study vs Leela showed that
> > Leela scaled better with increasing depth than Fatman.    Nothing  
> > like a
> > graph to reveal this very clearly, although you can also look at the
> > numbers if you are careful.
> >
> > It's important to point out that you will be completely mislead if you
> > don't get enough samples.  It's very rare that 100 or 200 games are
> > enough to draw any conclusions (unless it's really lopsided.)  I
> > remember once thinking I had found a clear scalable improvement but
> > decided that it must run longer - but I was hopeful.  When the
> > improvement started to decline, I discovered that I had by accident  
> > been
> > running the same exact program against itself.
> >
> > The point is that it is not uncommon to get really "lucky", and  
> > have an
> > equal program look substantially superior - for  a while.
> >
> > - Don
> >
> >
> >>  My prototype was quite strong considering that it used only 1000
> >> light playout
> >> (and score 25-30 % win against gnugo lvl 0), but it seemed to not get
> >> much
> >> over that as the number of playout grew ... (also there had a serious
> >> exponential complexity problem, which i never get into the trouble of
> >> investigating :) )
> >>
> >>  I know that Zoe was about 2000 elo with i think 50k simulations,
> >> and ... never
> >> got any real better as the number of simulations increased.
> >>
> >> Both prototype were toying with AMAF, so i really think you need a  
> >> bit
> >> of scalability
> >> study whenever trying to asses an engine employing it. (although it
> >> could very well be
> >> that the scalability trouble came out of some nasty bugs. Both
> >> aforementioned prototype
> >> where quite messy ...)
> >>
> >>> From: [EMAIL PROTECTED]
> >>> Subject: Re: [computer-go] RAVE formula of David Silver (reposted)
> >>> Date: Sat, 29 Nov 2008 03:39:58 -0200
> >>> To: computer-go@computer-go.org
> >>> CC:
> >>>
> >>>
> >>> On 28-nov-08, at 17:28, [EMAIL PROTECTED] wrote:
> >>>
> >>>> I would be very interested to see the RAVE code from Valkyria.
> >> I'm
> >>>> sure others would be too.
> >>>>
> >>>
> >>> I'm much more interested in a general concise description. If such
> >> a
> >>> description cannot be given easily, then I think there's little
> >> point
> >>> including it in the definition of a MCTS reference engine.
> >>>
> >>> I found a serious flaw in my code collecting the AMAF scores, which
> >>> explains why I wasn't seeing any gains so far with AMAF turned on.
> >>> Now over the first 100+ games UCT+RAVE scores 90% over plain UCT.
> >> I'm
> >>> going to run a test overnight, but so far it looks good. It should
> >>> have collected a few thousand samples by tomorrow.
> >>>
> >>> Hopefully next week I can go back to testing my list of playout
> >>> improvements, which is why I started making the MCTS reference
> >>> implementation in the first place. This RAVE stuff caused a bit of
> >> a
> >>> distraction, but it's nice to have if it works.
> >>>
> >>> Mark
> >>> _______________________________________________
> >>> computer-go mailing list
> >>> computer-go@computer-go.org
> >>> http://www.computer-go.org/mailman/listinfo/computer-go/
> >>
> >>
> >> _____________________________________________________________________ 
> >> _
> >> Souhaitez vous  « être au bureau sans y être » ? Oui je le veux !
> >> _______________________________________________
> >> computer-go mailing list
> >> computer-go@computer-go.org
> >> http://www.computer-go.org/mailman/listinfo/computer-go/
> > _______________________________________________
> > computer-go mailing list
> > computer-go@computer-go.org
> > http://www.computer-go.org/mailman/listinfo/computer-go/
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to