Dave, I have been in computer chess for years and this is a common problem.
One computer chess team I know of uses over 50,000 games to test most of their improvements, simply because this number is required in order to have even a reasonable certainty that something good happened or didn't happen. Of course if your improvements are MAJOR improvements you can determine that with less games, but the vast majority of improvements are not major. Top programs have to think in terms of 5 or 10 ELO point improvements or even less with the modifications they make. Once you have a very well developed program, you can't expect to get 25 ELO very often and this is even less likely if you have one of the best programs. Unfortunately, it's not practical for most of us to play that many games for each little tweak - so it's a difficult problem to solve (how to measure improvement.) I think this team and probably others have several multi-processor machines available and run games very quickly, about 1 game every second or two per processor. With the top programs it's possible to run that fast and still play at or near the master level. But no matter how fast your computer is, you want to run games that are representative of how you will actually play for real. But you have to make a compromise somewhere out of necessity unless you happen to have a hundred or more computers laying around. It's almost useless to trust just a few hundred games so an approximation is to play games that are 2 or 3 orders of magnitude faster than you expect to run normally. This may not fully test certain kinds of improvements. The only solution is more power, or more time! - Don On Wed, 2008-10-29 at 17:31 -0700, Dave Dyer wrote: > Here's a chance to share an amusing and illustrative anecdote. > > I was working on optimizing "Goodbot", a program that plays Tantrix, > and because of the nature of the game, the only way to really qualify > an improvement is to run many test games against a standard opponent. > > At one point, I was making nightly "test runs" of a few hundred games, > with slightly tweaked parameters and underlying tweaks to the algorithms, > always seeking to improve over the reference standard. At one point, > I discovered that due to some temporary code accidentally left in > place, I had actually been running exactly the same code for about > a month, and all perceived movement in "better" or "worse" directions > was only due to random fluctuations. > > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/
signature.asc
Description: This is a digitally signed message part
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/