In case of the Null-Move the improvement was not only in Nimzo-Nimzo games.
What is especially overestimated is the speed advantage. If e.g. 2 programms
have an identical evaluation function, but the other one calculates 2 Plies
more the slower twin has practically no chance. The faster one must commit
suicide to loose. With different evaluations its not so clear anymore,
although the deeper searcher has also in this case the better cards..
The Deep Blue team reported very big improvements when the build in the
Singular-Extensions. This was based on self-play and test-positions. The
Deep-Blue team argued that they had no serious other opponent and self-play
was the only way to test it.
In later experiments the improvements almost vanished. Other researchers
reported very mixed results. Some programms use the technique, some not. In
case of Singular-Extension it depends not only on the type of opponent but
also on the overall search techniques. The parts must fit togehter. E.g. the
open source programm Fruit is considered to have a very good search. I have
tried this search in Hydra, but the version was almost 200 Elo worse. I have
tried also the opposite and implemented the Hydra search in Fruit. This time
it was only a drop of 100 Elo. I have found nothing usefull in Fruit for the
Hydra search and the other way round.
In case of a weak programm there is additionally Vincents law. Testing a new
technique in a weak programm against itself is in my opinion useless.
Games are additionally hard-real-time problems. E.g. in the Orego tests but
versions got the same amount of nodes. For a realistic comparision one has
to give both sides the same time and not the same node-budget. Due to the
clear result the enhanced version should have won this too. The speed versus
knowledge/more refined methods is an important part of the game.
Chrilly
=======
I have heard this many times - but it doesn't always apply. In fact I
have heard that IMPROVEMENTS always look better against your
twin-brother but if that were true, I would always want to test against
my twin since it makes improvements stand out. It's hard to measure
small improvements so this would be like using a microscope to help me.
But unfortunately a change can help you beat your twin but make the
program worse against other opponents - but I have only occasionally
seen this be a big factor although I admit it does happen. I just
think the effect is exaggerated by people. A general rule is that if
you are better against Joe, you are "probably" better against Fred.
I'm doing some experiments with automated tuning (with my chess
program.) I have had a lot of success with this - in self test games I
increased my win percentage significantly over a lot of games. When I
play Toga, I actually achieved an even HIGHER win percentage.
But I never trust an "improvement" without testing against a variety of
opponents, at the minimum a self-test and a test against a different
opponent.
- Don
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/