On Thu, 2007-07-05 at 10:50 -0600, David Silver wrote: > We tried the whole spectrum from completely random to completely > deterministic playouts, but we never came close to the performance of > the "dumb" playouts!
I don't understand - I though Mogo wasn't using "dumb" play-outs? > We have seen a similar effect many times in MoGo. Often we try > something that seems like it should improve the quality of the > simulation player, but it makes the overall performance worse. It is > frustrating and surprising! Has anyone else encountered this? Here is why this happens: Let's think of this in the context of pruning moves, since controlling the play-outs can be cast in this way. Presumably, if you run 1000 random play-outs from a given position you will get a fair indication of "how good" the position is. But what if you are able to prune out many of the bad moves in that simulation? Would this improve the accuracy of the simulation? Probably, but not necessarily. Suppose that during the play-outs, you are able to prune out 50% of the "bad" black moves, but only 30% of the "bad" white moves? You would be playing 1000 simulations where BLACK was playing consistently stronger, regardless of how good the actual position was. If the chances were in fact pretty much even, it would look as if black had a big advantage. If that color bias was consistent for that "type" of position, building a UCT tree below it would not quickly fix the problem. The extra knowledge you impose is not impartial knowledge, it will work better for one side than another, and for one position differently than another. So even if the average quality of the play-outs improve each position is responding differently to the extra knowledge making it more difficult to compare one position to another. There is one other issue I have seen that is similar. Sometimes Lazarus will play a move that doesn't hurt nor help it's position. It's not a wasted move because the opponent must respond or else lose. An example is a simple self-atari which itself is a direct threat. The opponent is forced to respond, so there is no reason not to try for the cheap shot in his territory, but in the grand scheme of things this move is a distraction and if you could remove them from the tree it would help the program focus on what is really important. However, it sometimes pays to try moves like these. When I "fixed" this problem in Lazarus, it started winning less against weaker programs simply because they sometimes fail to defend. I imagine this can happen in more sophisticated contexts, where certain moves could be very effective in exploiting more naive (but not totally stupid) programs. In such a case, an improvement could make your program appear weaker. I don't think this is merely academic because I model skill not as how many good moves you play, but as how many bad moves you avoid playing. In other words there is no such thing as a good move - there are only bad moves. So you want to present your opponent with opportunities to play them. In chess it works the same, if you are playing a weak opponent it's really stupid to trade quickly into a drawn ending just because the position is even. - Don _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/