This is in response to a few posts about the "self-test" effect in ELO rating tests.
I'll start by claiming right up front that I don't believe, for certain types of programs, that this is something we have to worry unduly about. I'll explain why I feel that way in a moment. One general observation is that if you test 2 programs like we are doing, even though they are different programs with different authors, one has to admit that they have much in common. The are both based on Monte Carlo simulations and they both have a global tree search component that is critical to their success. Although the details differ and one implementation is superior to the other, they are the same basic type of program. So one might conclude this is a big self-test experiment and not fully valid for that reason. In the current test, Mogo stands alone at the top, FatMan fails to give it any competition at the upper levels for whatever reason. So one might also conclude this is a big Mogo self-test of scalability. It's natural to ask the question, "if mogo continues to show improvement with increasing power, is it really stronger or is it just stronger against lower powered versions of itself?" Another way to ask this questions is, "does the apparent improvement hold against other programs?" Or at least does it hold to the same extent? It's possible to take a given program such as "Gnugo" and build a program designed solely to beat it. In fact some have claimed or proposed that by tuning their program against Gnugo, they have succeeded in making it play really well against Gnugo, but have improved it very little against other programs. I think David Doshay has made this assertion or something very similar to it based on the fact that it is based on using Gnugo as a plausible move generator and evaluation function. This is a real effect which I don't question. I have done this myself. It happens when you tune your improvement against a specific programs weaknesses (and/or strengths.) You can make a program actually play weaker in general, but stronger against a specific opponent by specially tuning it a certain way. You can do this by making it ignore things the other program cannot take advantage of. I used to do that myself in chess, I could beat this weak player every time in less than 10 moves (just for fun) but to do it I had to actually play moves that were dubious at best. But this is not the same as building a program that is scalable using sound principles. It's one thing to tune a program in a very specific way to do some things better (usually by sacrificing to some extent one of its strengths), but it's quite another to build a program with a general mechanism that improves it's play in all areas without sacrificing any skill in another area. A thought experiment. Suppose a version of Mogo at a very high level setting can beat a lower level setting of the same Mogo 99% of the time. Is it likely this improvement is artificial and wouldn't apply to it's games against other programs? No. This is a real improvement because it's not based on making it weaker in some area to take advantage of a specific flaw. Is it possible that it doesn't scale quite as much against other programs? Yes, that is likely. When a program thinks exactly like another program, just deeper, it's almost like it has the ability to read the mind of the other program. In computer chess if you look 2 ply deeper against an identical program, you not only see exactly what he "will see" on the next move, but you also see an additional move deeper. This does give an advantage because if you overlook something important, so did the other program! You won't miss something that he can beat you over the head with. Nevertheless, you can only stretch this so far. If your superiority is substantial with a properly scalable program, it will be represented as a substantial improvement against any other program too. If your superiority against a specific opponent is based on specific trickery (tuning) to beat that specific opponent, then it may not translate to other opponents. It's like a fighter who leaves himself open to a left hook because he knows his opponent doesn't have a left hook a strategy that is unsound against another opponent, but not the one he faces now. So my assertion is that scalability based on sound principles is more or less universal with perhaps a small amount of self-play distortion, but nothing to get too excited about. - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/