2009/10/17 Petr Baudis <pa...@ucw.cz>

> On Fri, Oct 16, 2009 at 08:55:34PM +0200, "Ingo Althöfer" wrote:
> > In the year 2000 I bought the book
> > "EZ-GO: Oriental Strategy in a Nutshell",
> > by Bruce and Sue Wilcox. Ki Press; 1996.
> >
> > I can only recommend it for the many fresh ideas.
> > A few days ago I found time again to read in it.
> >
> > This time I was impressed by Bruce Wilcox's strange
> > opening "Great Wall", where Black starts with a loose
> > wall made of 5 stones, spanning over the whole board.
> >
> > Bruce proposes to play this setup as a surprise weapon,
> > even against stronger opponents.
> >
> > Now I made some autoplay tests, starting from the end position
> > given in the appendix of this mail.
> > * one game with Leela 3.16; Black won.
> > * four games with MFoG 12.016; two wins each for Black and White.
> > So there is some indiciation that the Great Wall works even
> > for bots, who are not affected by psychology.
>
> In general, especially in environment so stochastic as MCTS, these are
> awfully small samples. To get even into a +-10% confidence interval, you
> need at least 100 (that is, ONE HUNDRED) games. Otherwise, the results
> aren't statistically meaningful at all, as I have myself painfully
> discovered so often ;-) - they can be too heavily distorted.
>

100 Games doesn't even tell you much unless the difference is pretty large.


In the testing I do, 10,000 games between players are required before I can
start thinking about making a decision.   When I tune an evaluation
function, (and search algorithms) for chess by playing games against various
opponents,   many small but useful evaluation parameters contribute less
than 10 ELO points to the strength.   10,000 isn't really enough to accept
some changes but I take it as a matter of faith once the error margins are
+/-  a few ELO points.   I have to do this due to the limited resources I
have available.     If the change is of the nature where it slows the
program down but appears to make up for it with extra quality, I am even
more paranoid about accepting it because a few "random" slowdowns that have
a chance to weaken the program can kill it.

I have found it very common to get what might seem to be a convincing lead
after 200 or 300 games, only to see it come crashing down.   I have ramped
up the strength of the program by over 100 ELO with a large number of
small  ELO  improvements, but if I start accepting larger error margins the
changes become almost random.

Of course a few hundred games is plenty if you are talking about a major
improvement.

I know people who claim they can look at the games themselves and make a
good judgment.   I don't even begin to believe that, because the human brain
is so suggestive.  If you know what change you made and you look at games,
it's very difficult to stop the brain from interpreting many of the moves in
terms of the change.    However it's still useful to look at games if you
use great caution but mainly to look for bugs and side-effects and when you
think you seem something you have to chase it down to see if you saw what
you think you saw!

- Don





>
> --
>                                Petr "Pasky" Baudis
> A lot of people have my books on their bookshelves.
> That's the problem, they need to read them. -- Don Knuth
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to