Rémi,

After sending this last message to the list I thought of a couple of things.   
Just a few weeks ago I lowered the default rating about 400 ELO.    The default 
rating is the initial rating you receive by default.   

I didn't believe this would be a problem because if you play someone who has 
less than a few games, the K constant in the ELO formula is really very tiny 
for you, but high for this new player.    It seems like it would be better to 
earn your way upwards than in the reverse, and many weak new players were 
starting with high ratings they didn't deserve.

But it could very well be that the anchor player cannot absorb the difference 
fast enough.   I probably need a better way to initially establish a players 
rating.   A simple way is to not rate the first few games against new players.  
 

I'm also doing another mini study to test your theory that intransitivity is a 
huge problem.  I'm running individual tests between GnuGo and versions of mogo. 
  So we will see how Mogo does against gnugo by itself at various doublings.   
I don't need to go too high since we are concerned about the effect on CGOS - 
so this study should only take a few days at most.  After that, we can consider 
how mogo scales against itself.    

I've already run a few dozen games.  Here are the results so far:

[EMAIL PROTECTED]:~/TEST_OF_SCALABILITY/TestScale$ evalgo t01.db stat

PLAYER        TIME/GME   RATING  GAMES     WIN%     Total games: 34
------------  --------  -------  -----  -------
Gnugo-3.7.11     84.89   1800.0     34   100.00   Gnugo level 10
Mogo_01           0.08   1153.2     34     0.00   Mogo at 64 play-outs

[EMAIL PROTECTED]:~/TEST_OF_SCALABILITY/TestScale$ evalgo t02.db stat

PLAYER        TIME/GME   RATING  GAMES     WIN%     Total games: 26
------------  --------  -------  -----  -------
Gnugo-3.7.11     95.96   1800.0     26   100.00   Gnugo level 10
Mogo_02           0.11   1194.1     26     0.00   Mogo at 128 play-outs

[EMAIL PROTECTED]:~/TEST_OF_SCALABILITY/TestScale$ evalgo t03.db stat

PLAYER        TIME/GME   RATING  GAMES     WIN%     Total games: 24
------------  --------  -------  -----  -------
Gnugo-3.7.11     91.66   1800.0     24    83.33   Gnugo level 10
Mogo_03           0.18   1540.4     24    16.67   Mogo at 256 play-outs


I'll expand this to higher levels when I have a significant amount of data for 
these levels.


gnugo is gnugo-3.7.11 at level 10

- Don




Don Dailey wrote:
> Rémi Coulom wrote:
>   
>> I believe the main problem is that the Elo-rating model is wrong for
>> bots. The phenomenon with Mogo is probably the same as Crazy Stone: if
>> there are enough strong MC bots playing to shield the top MC programs
>> from playing against GNU, then they'll get a high rating because they
>> are efficient at beating other MC bots. Otherwise, they are forced to
>> play against GNU, and lose points.
>>
>> For instance:
>> http://www.lri.fr/~teytaud/cross/CS-9-17-2CPU.html
>> GNU     1946     22 / 27     81.48
>> GnuCvs-10     1969     26 / 31     83.87
>> AyaMC637_4CPU     2108     18 / 19     94.74
>>     
> I don't see that.    Crazy Stone beat GnuCvs-10 83.87% - based on it's
> rating it should be winning
> just slightly more,  but the ratings are very noisy.   This doesn't see
> the least bit unusual.
>
> Similar with GNU,  a little worse than it should be doing but suppose it
> had won just 2 more
> game out of 27?    It would jump up to 90%  from 83.87 and you could
> argue that it did BETTER than it should against this bot had that
> happened.     Low sample size.
>
> If you want to pick out data points,  what about AnotherGNU37 rated
> 1921?   CrazyStone won 100% of the games,  much better than it should
> have done against a 1946 program if you ignore the fact that only 4
> games were played.
>
> I also noticed that this version was not on the December allTime rating
> list which means it didn't even play 200 games.   
>
> I have no doubt that there might be some intransitivities between
> programs,  but I see no evidence that is is worse than in humans or in
> other computer games.
>
> - Don
>
>
>
>
>
>
>
>
>   
>> A very easy way to get over-evaluated on CGOS is to have two versions
>> of the same program that play each other. For instance, if I connect
>> CS-2CPU and CS-8CPU, they will play most of their games against each
>> other, ans CS-8CPU will get an incredible rating.
>>
>> Just incorporate GNU in Don's scalability study, and the rating range
>> will shrink a lot.
>>
>> Rémi
>> _______________________________________________
>> computer-go mailing list
>> computer-go@computer-go.org
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>>
>>     
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/
>
>   
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to