On Mon, 2008-09-08 at 20:01 +0200, Gian-Carlo Pascutto wrote: > Don Dailey wrote: > > > That probably just means I have not stumbled on the right ideas or that > > I was not able to properly tune it. I would be delighted if someone > > was able to show us a workable scheme. I believe if something is found > > it will result in a very minor improvement, but that it will be an > > actual improvement. > > Would a discrepancy on the amount of ELO gained or lost per handicap > stone, when comparing MC bots to humans & classical computers, be a good > measure of the maximum possible improvement?
Maybe. How could you accurately make such a measurement without thousands of games? The problem seems to be a catch-22. If you are in a dead won position, it's really risking telling the program you are NOT in a dead won position. It now doesn't understand what is required to win the game, it only knows that it must win another stone at all costs, whether it's possible or not. Most of the time IT IS possible to win another stone or more without much risk. But as soon as you do, the dynamic komi adjuster says you must do it again, and again, and again until you reach a situation where it is not possible. And I believe that is where the trouble comes. At some point, you have set a goal too high to reach. This is signal to the program that it must try at all costs to win (what appears to it to be) a dead lost game and of course it will very likely play a high risk desperation move in order to please its master. So some simple naive scheme is not going to work. However this would probably work pretty well if you have some way to gain prior knowledge about whether it would be safe to "escalate" or not. One simple way that might work with some tuning is to use search. If you are winning the game with high confidence, reset the komi a few stones and do another search (from scratch) to see if you are still easily winning. Perhaps something like a binary search will find the right komi value that gives you a high winning confidence with maximum greed or some acceptable balance of such. However, such a scheme is going to cost you resources - which perhaps may cancel some or all of the benefit. My own gut feeling tells me that you are playing with pretty small margins. At best how much can we expect to gain? I think this is probably something we need to explore and do - especially if it's important to the reputation of your product, or to produce a product that mimics more the style of human players who are less concerned about the beauty of omission. I personally see a bit of beauty in this style even though it certainly looks odd when you are not used to it. When losing the game, a dynamic adjuster may be safer. After all, you are losing anyway, so why not try something? It's not risky trying to win a lost game by picking off whatever stones you can. - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/