2009/10/17 Petr Baudis <pa...@ucw.cz> > On Fri, Oct 16, 2009 at 08:55:34PM +0200, "Ingo Althöfer" wrote: > > In the year 2000 I bought the book > > "EZ-GO: Oriental Strategy in a Nutshell", > > by Bruce and Sue Wilcox. Ki Press; 1996. > > > > I can only recommend it for the many fresh ideas. > > A few days ago I found time again to read in it. > > > > This time I was impressed by Bruce Wilcox's strange > > opening "Great Wall", where Black starts with a loose > > wall made of 5 stones, spanning over the whole board. > > > > Bruce proposes to play this setup as a surprise weapon, > > even against stronger opponents. > > > > Now I made some autoplay tests, starting from the end position > > given in the appendix of this mail. > > * one game with Leela 3.16; Black won. > > * four games with MFoG 12.016; two wins each for Black and White. > > So there is some indiciation that the Great Wall works even > > for bots, who are not affected by psychology. > > In general, especially in environment so stochastic as MCTS, these are > awfully small samples. To get even into a +-10% confidence interval, you > need at least 100 (that is, ONE HUNDRED) games. Otherwise, the results > aren't statistically meaningful at all, as I have myself painfully > discovered so often ;-) - they can be too heavily distorted. >
100 Games doesn't even tell you much unless the difference is pretty large. In the testing I do, 10,000 games between players are required before I can start thinking about making a decision. When I tune an evaluation function, (and search algorithms) for chess by playing games against various opponents, many small but useful evaluation parameters contribute less than 10 ELO points to the strength. 10,000 isn't really enough to accept some changes but I take it as a matter of faith once the error margins are +/- a few ELO points. I have to do this due to the limited resources I have available. If the change is of the nature where it slows the program down but appears to make up for it with extra quality, I am even more paranoid about accepting it because a few "random" slowdowns that have a chance to weaken the program can kill it. I have found it very common to get what might seem to be a convincing lead after 200 or 300 games, only to see it come crashing down. I have ramped up the strength of the program by over 100 ELO with a large number of small ELO improvements, but if I start accepting larger error margins the changes become almost random. Of course a few hundred games is plenty if you are talking about a major improvement. I know people who claim they can look at the games themselves and make a good judgment. I don't even begin to believe that, because the human brain is so suggestive. If you know what change you made and you look at games, it's very difficult to stop the brain from interpreting many of the moves in terms of the change. However it's still useful to look at games if you use great caution but mainly to look for bugs and side-effects and when you think you seem something you have to chase it down to see if you saw what you think you saw! - Don > > -- > Petr "Pasky" Baudis > A lot of people have my books on their bookshelves. > That's the problem, they need to read them. -- Don Knuth > _______________________________________________ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ >
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/