On Sat, Oct 17, 2009 at 08:36:13AM -0400, Don Dailey wrote: > 2009/10/17 Petr Baudis <pa...@ucw.cz> > > > On Fri, Oct 16, 2009 at 08:55:34PM +0200, "Ingo Althöfer" wrote: > > > In the year 2000 I bought the book > > > "EZ-GO: Oriental Strategy in a Nutshell", > > > by Bruce and Sue Wilcox. Ki Press; 1996. > > > > > > I can only recommend it for the many fresh ideas. > > > A few days ago I found time again to read in it. > > > > > > This time I was impressed by Bruce Wilcox's strange > > > opening "Great Wall", where Black starts with a loose > > > wall made of 5 stones, spanning over the whole board. > > > > > > Bruce proposes to play this setup as a surprise weapon, > > > even against stronger opponents. > > > > > > Now I made some autoplay tests, starting from the end position > > > given in the appendix of this mail. > > > * one game with Leela 3.16; Black won. > > > * four games with MFoG 12.016; two wins each for Black and White. > > > So there is some indiciation that the Great Wall works even > > > for bots, who are not affected by psychology. > > > > In general, especially in environment so stochastic as MCTS, these are > > awfully small samples. To get even into a +-10% confidence interval, you > > need at least 100 (that is, ONE HUNDRED) games. Otherwise, the results > > aren't statistically meaningful at all, as I have myself painfully > > discovered so often ;-) - they can be too heavily distorted. > > > > 100 Games doesn't even tell you much unless the difference is pretty large.
Well, this is simple math. With 100 bernoulli trials, your 95%-confidence interval is at ~ +-10% if your rates are around 50%. Of course, if the results you want to compare are closer than within 20%, you will need more trials. :-) When I'm too lazy to compute this for myself or for some reason don't use gogui-twogtp that computes the error (confidence_interval/1.96) for me, I find http://statpages.org/confint.html pretty handy for quick calculations. (To convert win rates to ELO differences, I found http://www.chesselo.com/probabil.html useful, but I don't find ELO too useful for basic improvements testing, since I compare only winrates against a single reference player.) -- Petr "Pasky" Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/