I haven't gotten very far yet in incorporating many of the suggestions published on this mailing-list into the MCTS ref-bot. As such I feel I still have a lot of catching up to do when it comes to MC programs, mostly due to lack of time.

But I had an idea I wanted to share as I haven't seen anything like it described here. It comes forth from an observation I had when looking at playouts and what effects some of the patterns had on it. So far it's my opinion that guiding playouts is mostly useful in order to maintain certain features of the original position and prevent the random walk from stumbling into an unreasonable result. As an example I'm going to use the simple case of a stone in atari that cannot escape. When random play tries an escaping move, I make the program automatically play the capturing move to maintain the status of the stone(s) (now more than one) in atari. When implementing something like that in the playouts however, more often than not this 'pattern' arises not based on the original position but purely from the random play. I figured it doesn't help the program at all trying to maintain the captured status of a stone or stones that weren't even on the board at the start of the playout.

So I tried a simple experiment: whenever a single stone is placed on the board I record the time (move-number really) it was created in an array I call stoneAge. When more stones are connected to the original they get the same age. When two chains merge I pick an arbitrary age of the two (I could have picked the smallest number, but it doesn't really matter). So for each chain of stones the array marks the earliest time of creation. Next, when a playout starts, I mark the starting time in a variable I call 'playoutStart' and there's a simple function:

boolean isPrehistoric(int chain)
{
        return stoneAge[chain]<=playoutStart;
}

During playout, I only apply the tactical reader to chains for which the isPrehistoric() function returns true. Tests show that using this method doesn't affect the strength of the program at all. But the amount of time spent in the tactical reader is cut in less than half.

I'm suspecting the same holds true to a large degree for other patterns, but I haven't had the time yet to test that. Other cases may not provide as much gain because they are cheaper to compute. But I think in general it's better to let the random play run its course as much as possible and restrict moves guided by patterns as much as possible to situations relevant to the original position. The stone- age information is very cheap to maintain so it's hardly a burden to use.

Hope this helps anyone, especially those with slow tactical readers :) If anyone manages to use this successfully in other situations than tactical readers I'd be interested to hear it, as so far it's only a hunch that this has wider applicability than just tactics. I was going to wait until posting this until I had time to try it out for myself but lately I didn't have the time.

Mark

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to