On Sat, Oct 17, 2009 at 08:36:13AM -0400, Don Dailey wrote:
> 2009/10/17 Petr Baudis <pa...@ucw.cz>
> 
> > On Fri, Oct 16, 2009 at 08:55:34PM +0200, "Ingo Althöfer" wrote:
> > > In the year 2000 I bought the book
> > > "EZ-GO: Oriental Strategy in a Nutshell",
> > > by Bruce and Sue Wilcox. Ki Press; 1996.
> > >
> > > I can only recommend it for the many fresh ideas.
> > > A few days ago I found time again to read in it.
> > >
> > > This time I was impressed by Bruce Wilcox's strange
> > > opening "Great Wall", where Black starts with a loose
> > > wall made of 5 stones, spanning over the whole board.
> > >
> > > Bruce proposes to play this setup as a surprise weapon,
> > > even against stronger opponents.
> > >
> > > Now I made some autoplay tests, starting from the end position
> > > given in the appendix of this mail.
> > > * one game with Leela 3.16; Black won.
> > > * four games with MFoG 12.016; two wins each for Black and White.
> > > So there is some indiciation that the Great Wall works even
> > > for bots, who are not affected by psychology.
> >
> > In general, especially in environment so stochastic as MCTS, these are
> > awfully small samples. To get even into a +-10% confidence interval, you
> > need at least 100 (that is, ONE HUNDRED) games. Otherwise, the results
> > aren't statistically meaningful at all, as I have myself painfully
> > discovered so often ;-) - they can be too heavily distorted.
> >
> 
> 100 Games doesn't even tell you much unless the difference is pretty large.

Well, this is simple math. With 100 bernoulli trials, your
95%-confidence interval is at ~ +-10% if your rates are around 50%.
Of course, if the results you want to compare are closer than within
20%, you will need more trials. :-)

When I'm too lazy to compute this for myself or for some reason don't
use gogui-twogtp that computes the error (confidence_interval/1.96) for
me, I find http://statpages.org/confint.html pretty handy for quick
calculations.

(To convert win rates to ELO differences, I found
http://www.chesselo.com/probabil.html useful, but I don't find ELO too
useful for basic improvements testing, since I compare only winrates
against a single reference player.)

-- 
                                Petr "Pasky" Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to