I think one of the problems is in testing. Currently we have almost
no way to judge whether a improvement is good or bad, other than
playing a lot of games against GNU Go. It takes very long time and
seems inefficient. Moreover, even it may not be a very good method.
GNU Go often cannot respond to an obvious bad move correctly, so
pruning such moves decrease the winning rate.

This is THE problem in game programming. To measure progress. Usually an improvement is worth 10 Elo. It takes about 1000 games to determine with statistical significance such an improvement. Usually one does not make 1000 games, 100 games are already quite a lot. One chooses often not the best but the most lucky version. If one version has an especially good result I rerun the test-matches under different conditions (time setting).
Only if the results are repeatable, the version is considered best.
If an improvement is worth 100 Elo, there is no need for extensive testing. One sees this immediatly. In fact also smaller improvements are in the end chosen by intuition/feeling.

In Go things are insofar worse as there is only one standard sparring partner, Gnu-Go. This creates severe inbreeding effects. In chess there was a similar problem. There were more strong opponents around, but over the years they become very similar. Suddenly there was a new programm, Rybka, which plays different and all the inbreedings have a lot of difficulties.

I think there is no better way. One can do some pre-filtering with test positions. If a version is especially bad in these tests, one can ignore it. But being good in test positions and in games are different things.

Erdstrahlen:
Jan Louwman was a fanatic tester. His small house was full of board-computers. He played by hand 20 games at once (we are in the pre-PC computer chess times). He always reported spectacular results for the programms of Ed Schroeder. But when the programms went to market, nobody could replicate Jans results. The programms were strong, but not spectacular. Thomas Mally of the Viennes chess magazine Module explained this with the different natural radiation (German "Erdstrahlen") in Rotterdam and elsewere. Eds programm were optimized for this "Erdstrahlen". The "Erdstrahlen-Theorie" become a running joke in the chess-community. Whenever 2 testers reported quite different result, it was "explained" by the different amout of "Erdstrahlen".

It is impossible to play by hand 1000 games for each version. Jan usually played with 30 sec. or 1 min/move. It would have taken forever. His spectacular version was just a very lucky one. If you play enough, you always get one. But his testing was certainly a significant contribution to the development of Rebel. And it was a very good medicine for Jan. He would have died much earlier without this testing.

Chrilly

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to