I haven't gotten very far yet in incorporating many of the suggestions
published on this mailing-list into the MCTS ref-bot. As such I feel I
still have a lot of catching up to do when it comes to MC programs,
mostly due to lack of time.
But I had an idea I wanted to share as I haven't seen anything like it
described here. It comes forth from an observation I had when looking
at playouts and what effects some of the patterns had on it. So far
it's my opinion that guiding playouts is mostly useful in order to
maintain certain features of the original position and prevent the
random walk from stumbling into an unreasonable result. As an example
I'm going to use the simple case of a stone in atari that cannot
escape. When random play tries an escaping move, I make the program
automatically play the capturing move to maintain the status of the
stone(s) (now more than one) in atari. When implementing something
like that in the playouts however, more often than not this 'pattern'
arises not based on the original position but purely from the random
play. I figured it doesn't help the program at all trying to maintain
the captured status of a stone or stones that weren't even on the
board at the start of the playout.
So I tried a simple experiment: whenever a single stone is placed on
the board I record the time (move-number really) it was created in an
array I call stoneAge. When more stones are connected to the original
they get the same age. When two chains merge I pick an arbitrary age
of the two (I could have picked the smallest number, but it doesn't
really matter). So for each chain of stones the array marks the
earliest time of creation. Next, when a playout starts, I mark the
starting time in a variable I call 'playoutStart' and there's a simple
function:
boolean isPrehistoric(int chain)
{
return stoneAge[chain]<=playoutStart;
}
During playout, I only apply the tactical reader to chains for which
the isPrehistoric() function returns true. Tests show that using this
method doesn't affect the strength of the program at all. But the
amount of time spent in the tactical reader is cut in less than half.
I'm suspecting the same holds true to a large degree for other
patterns, but I haven't had the time yet to test that. Other cases may
not provide as much gain because they are cheaper to compute. But I
think in general it's better to let the random play run its course as
much as possible and restrict moves guided by patterns as much as
possible to situations relevant to the original position. The stone-
age information is very cheap to maintain so it's hardly a burden to
use.
Hope this helps anyone, especially those with slow tactical readers :)
If anyone manages to use this successfully in other situations than
tactical readers I'd be interested to hear it, as so far it's only a
hunch that this has wider applicability than just tactics. I was going
to wait until posting this until I had time to try it out for myself
but lately I didn't have the time.
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/