I think one of the problems is in testing. Currently we have almost
no way to judge whether a improvement is good or bad, other than
playing a lot of games against GNU Go. It takes very long time and
seems inefficient. Moreover, even it may not be a very good method.
GNU Go often cannot respond to an obvious bad move correctly, so
pruning such moves decrease the winning rate.
This is THE problem in game programming. To measure progress. Usually an
improvement is worth 10 Elo. It takes about 1000 games to determine with
statistical significance such an improvement. Usually one does not make 1000
games, 100 games are already quite a lot. One chooses often not the best but
the most lucky version. If one version has an especially good result I rerun
the test-matches under different conditions (time setting).
Only if the results are repeatable, the version is considered best.
If an improvement is worth 100 Elo, there is no need for extensive testing.
One sees this immediatly. In fact also smaller improvements are in the end
chosen by intuition/feeling.
In Go things are insofar worse as there is only one standard sparring
partner, Gnu-Go. This creates severe inbreeding effects. In chess there was
a similar problem. There were more strong opponents around, but over the
years they become very similar. Suddenly there was a new programm, Rybka,
which plays different and all the inbreedings have a lot of difficulties.
I think there is no better way. One can do some pre-filtering with test
positions. If a version is especially bad in these tests, one can ignore it.
But being good in test positions and in games are different things.
Erdstrahlen:
Jan Louwman was a fanatic tester. His small house was full of
board-computers. He played by hand 20 games at once (we are in the pre-PC
computer chess times).
He always reported spectacular results for the programms of Ed Schroeder.
But when the programms went to market, nobody could replicate Jans results.
The programms were strong, but not spectacular. Thomas Mally of the Viennes
chess magazine Module explained this with the different natural radiation
(German "Erdstrahlen") in Rotterdam and elsewere. Eds programm were
optimized for this "Erdstrahlen". The "Erdstrahlen-Theorie" become a running
joke in the chess-community. Whenever 2 testers reported quite different
result, it was "explained" by the different amout of "Erdstrahlen".
It is impossible to play by hand 1000 games for each version. Jan usually
played with 30 sec. or 1 min/move. It would have taken forever. His
spectacular version was just a very lucky one. If you play enough, you
always get one. But his testing was certainly a significant contribution to
the development of Rebel. And it was a very good medicine for Jan. He would
have died much earlier without this testing.
Chrilly
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/