On Thu, Aug 4, 2011 at 4:41 PM, David Fotland <[email protected]>wrote:

> Remember that the confidence interval is two sided, so 3% means plus or
> minus 3%.  So 52% win rate is within +- 3% of 50%.
>
>
Yes.

And something else that is rarely considered is that the error margin does
not mean what you think it does if you pick and choose when to observe it.
 For example you don't just stop the test because you like the current
result and error margin.

The correct way to interpret the error margin is to decided in advance
exactly how many games you are going to play - and then the error margins
mean what it is supposed to mean (also considering that it is two sided that
is.)

When I test I use bayeselo and I set the confidence to 99% instead of the
standard 95%  because we are not as strict as we should be about this stuff
(since due to limited resources we must be able to stop tests early.)    But
we are at least aware of the problems and issue.

Don




> David
>
> > -----Original Message-----
> > From: [email protected] [mailto:computer-go-
> > [email protected]] On Behalf Of Vlad Dumitrescu
> > Sent: Thursday, August 04, 2011 1:14 PM
> > To: [email protected]
> > Subject: Re: [Computer-go] testing improvements
> >
> > Hi,
> >
> > On Thu, Aug 4, 2011 at 19:29, David Fotland <[email protected]>
> wrote:
> > > Did each fuego play the same number of games vs gnugo, and did each
> play
> > > half its games on each color?
> >
> > Yes, I set up an all-play-all competition with gomill.
> >
> > On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
> > <[email protected]> wrote:
> > > On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu <[email protected]>
> > wrote:
> > >  The scores towards gnugo are almost
> > >> identical, but the two fuegos score 449-415, which is 52% and the 95%
> > >> confidence is ~3%, i.e. ~10 ELO.
> > >
> > > That 3% is not a 95% confidence interval, more like 1 standard
> > > deviation... (so nothing with high confidence yet)
> >
> > I took the easy way out and used a formula mentioned by David Fotland
> > on this list for a while ago
> >
> > >There is a simple formula to estimate the confidence interval of a
> result.
> > >I use it to see if a new version is likely better than a reference
> version
> > >(but I use 95% confidence intervals, so over hundred of experiments it
> > gives
> > >me the wrong answer too often).
> > >1.96 * sqrt(wr * (1 - wr) / trials)
> > >Where wr is the win rate of one version vs the reference, and trials is
> the
> > >number of test games.
> >
> > On Thu, Aug 4, 2011 at 20:21, Kahn Jonas <[email protected]>
> wrote:
> > > All the more since you're testing the same idea on two bots
> > > simultaneaously. So if you want to be wrong at most five percent of the
> > > time, and consider you are better as soon as one of the bots gets
> > > better, you have to make individual tests at the 2.5% level.
> >
> > At the moment I ran the bots without any modification, to see if
> > everything works fine. So I think that the results between the
> > identical bots should have been closer to 50% or at least to swing
> > sometimes to the other side of 50%. Right now it's 625-566, which is
> > 52,5% and  2.83% confidence according to the formula above.
> >
> > The results are
> > fuego-1.1 v fuego-new (1199/2000 games)
> > unknown results: 1 0.08%
> > board size: 9   komi: 6.5
> >             wins              black          white        avg cpu
> > fuego-1.1    569 47.46%       386 64.33%     183 30.55%      2.69
> > fuego-new    629 52.46%       415 69.28%     214 35.67%      2.67
> >                               801 66.81%     397 33.11%
> >
> > I realize that statistic results don't always match what one would
> > expect, but this should be a straightforward case...
> >
> > Thanks a lot for all the answers!
> >
> > regards,
> > /Vlad
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to