Dave,

I have been in computer chess for years and this is a common problem.  

One  computer chess team I know of uses over 50,000 games to test most
of their improvements,  simply because this number is required in order
to have even a reasonable certainty that something good happened or
didn't happen.   Of course if your improvements are MAJOR improvements
you can determine that with less games,  but the vast majority of
improvements are not major.   Top programs have to think in terms of 5
or 10 ELO point improvements or even less with the modifications they
make.   

Once you have a very well developed program, you can't expect to get 25
ELO very often and this is even less likely if you have one of the best
programs. 

Unfortunately,  it's not practical for most of us to play that many
games for each little tweak - so it's a difficult problem to solve (how
to measure improvement.)     I think this team and probably others have
several multi-processor machines available and run games very quickly,
about 1 game every second or two per processor.   With the top programs
it's possible to run that fast and still play at or near the master
level.     

But no matter how fast your computer is,  you want to run games that are
representative of how you will actually play for real.    But you have
to make a compromise somewhere out of necessity unless you happen to
have a hundred or more computers laying around.   It's almost useless to
trust just a few hundred games so an approximation is to play games that
are 2 or 3 orders of magnitude faster than you expect to run normally.
This may not fully test certain kinds of improvements.  The only
solution is more power, or more time!   

- Don
    



On Wed, 2008-10-29 at 17:31 -0700, Dave Dyer wrote:
> Here's a chance to share an amusing and illustrative anecdote.
> 
> I was working on optimizing "Goodbot",  a program that plays Tantrix,
> and because of the nature of the game, the only way to really qualify
> an improvement is to run many test games against a standard opponent.
> 
> At one point, I was making nightly "test runs" of a few hundred games,
> with slightly tweaked parameters and underlying tweaks to the algorithms,
> always seeking to improve over the reference standard.   At one point,
> I discovered that due to some temporary code accidentally left in
> place, I had actually been running exactly the same code for about
> a month, and all perceived movement in "better" or "worse" directions
> was only due to random fluctuations.
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to