On Thu, Aug 4, 2011 at 22:41, David Fotland <[email protected]> wrote:
> Remember that the confidence interval is two sided, so 3% means plus or
> minus 3%.  So 52% win rate is within +- 3% of 50%.

Yes, of course. What I reacted to was that under the whole test, one
bot always had around 52% wins (well, after some 100 games, at least).
I would have thought it would move around the real value.

Thanks,
Vlad

>> -----Original Message-----
>> From: [email protected] [mailto:computer-go-
>> [email protected]] On Behalf Of Vlad Dumitrescu
>> Sent: Thursday, August 04, 2011 1:14 PM
>> To: [email protected]
>> Subject: Re: [Computer-go] testing improvements
>>
>> Hi,
>>
>> On Thu, Aug 4, 2011 at 19:29, David Fotland <[email protected]>
> wrote:
>> > Did each fuego play the same number of games vs gnugo, and did each play
>> > half its games on each color?
>>
>> Yes, I set up an all-play-all competition with gomill.
>>
>> On Thu, Aug 4, 2011 at 19:55, Erik van der Werf
>> <[email protected]> wrote:
>> > On Thu, Aug 4, 2011 at 6:57 PM, Vlad Dumitrescu <[email protected]>
>> wrote:
>> >  The scores towards gnugo are almost
>> >> identical, but the two fuegos score 449-415, which is 52% and the 95%
>> >> confidence is ~3%, i.e. ~10 ELO.
>> >
>> > That 3% is not a 95% confidence interval, more like 1 standard
>> > deviation... (so nothing with high confidence yet)
>>
>> I took the easy way out and used a formula mentioned by David Fotland
>> on this list for a while ago
>>
>> >There is a simple formula to estimate the confidence interval of a
> result.
>> >I use it to see if a new version is likely better than a reference
> version
>> >(but I use 95% confidence intervals, so over hundred of experiments it
>> gives
>> >me the wrong answer too often).
>> >1.96 * sqrt(wr * (1 - wr) / trials)
>> >Where wr is the win rate of one version vs the reference, and trials is
> the
>> >number of test games.
>>
>> On Thu, Aug 4, 2011 at 20:21, Kahn Jonas <[email protected]>
> wrote:
>> > All the more since you're testing the same idea on two bots
>> > simultaneaously. So if you want to be wrong at most five percent of the
>> > time, and consider you are better as soon as one of the bots gets
>> > better, you have to make individual tests at the 2.5% level.
>>
>> At the moment I ran the bots without any modification, to see if
>> everything works fine. So I think that the results between the
>> identical bots should have been closer to 50% or at least to swing
>> sometimes to the other side of 50%. Right now it's 625-566, which is
>> 52,5% and  2.83% confidence according to the formula above.
>>
>> The results are
>> fuego-1.1 v fuego-new (1199/2000 games)
>> unknown results: 1 0.08%
>> board size: 9   komi: 6.5
>>             wins              black          white        avg cpu
>> fuego-1.1    569 47.46%       386 64.33%     183 30.55%      2.69
>> fuego-new    629 52.46%       415 69.28%     214 35.67%      2.67
>>                               801 66.81%     397 33.11%
>>
>> I realize that statistic results don't always match what one would
>> expect, but this should be a straightforward case...
>>
>> Thanks a lot for all the answers!
>>
>> regards,
>> /Vlad
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to