RAVE is part of a larger family of algorithms. In general we can use
direct Monte-Carlo results (i.e., the move played directly from a
node) to determine the probability of winning after playing such a
move. The generalized RAVE (GRAVE?) family does this by including
(usually with some discount) moves played on "similar" boards.
Different algorithms in this family count different boards as "similar":
Basic MCTS (i.e., UCT) without a transposition table counts no other
boards.
A transposition table counts "identical" boards, i.e., those with the
same stones on the board, player to move, simple ko point, and number
of passes.
AMAF counts all boards.
RAVE counts boards that follow the current board in a playout.
CRAVE (Context-dependent RAVE) counts boards where the neighborhood of
the move in question looks "similar". Dave Hillis discussed one
implementation for this. I tried another; it works better than plain
MCTS, but not as well as RAVE.
NAVE (Nearest-neighbor RAVE) counts some set of boards which have a
small Hamming distance from the current board. Literally storing all
board-move pairs is catastrophically expensive in both memory and time.
DAVE (Distributed RAVE) stores this information "holographically",
storing win/run counts for each move combined with each point/color
combination on the board. Thus, there are a set of runs for when a2 is
black, another for when e3 is vacant, and so forth. To find the values
for a particular board, sum across the points on that board. This is
too expensive, but by probing based on only one random point, I was
able to get something that beats MCTS (but not RAVE).
The following are left as exercises:
http://www.onelook.com/?loc=rz4&w=*ave&scwo=1&sswo=1
It's conceivable that some statistical machine learning technique
(e.g., neural networks) could be applied, with the playouts providing
data for the regression.
The more I study this and try different variants, the more impressed I
am by RAVE. "Boards after the current board" is a very clever way of
defining similarity. Also, recorded RAVE playouts, being stored in
each node, expire in an elegant way. It still seems that RAVE fails to
exploit some "sibling" information. For example, if I start a playout
with black A, white B, and white wins, I should (weakly) consider B as
a response to any black first move.
Peter Drake
http://www.lclark.edu/~drake/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/