RAVE is part of a larger family of algorithms. In general we can use direct Monte-Carlo results (i.e., the move played directly from a node) to determine the probability of winning after playing such a move. The generalized RAVE (GRAVE?) family does this by including (usually with some discount) moves played on "similar" boards. Different algorithms in this family count different boards as "similar":

Basic MCTS (i.e., UCT) without a transposition table counts no other boards.

A transposition table counts "identical" boards, i.e., those with the same stones on the board, player to move, simple ko point, and number of passes.

AMAF counts all boards.

RAVE counts boards that follow the current board in a playout.

CRAVE (Context-dependent RAVE) counts boards where the neighborhood of the move in question looks "similar". Dave Hillis discussed one implementation for this. I tried another; it works better than plain MCTS, but not as well as RAVE.

NAVE (Nearest-neighbor RAVE) counts some set of boards which have a small Hamming distance from the current board. Literally storing all board-move pairs is catastrophically expensive in both memory and time.

DAVE (Distributed RAVE) stores this information "holographically", storing win/run counts for each move combined with each point/color combination on the board. Thus, there are a set of runs for when a2 is black, another for when e3 is vacant, and so forth. To find the values for a particular board, sum across the points on that board. This is too expensive, but by probing based on only one random point, I was able to get something that beats MCTS (but not RAVE).

The following are left as exercises:

http://www.onelook.com/?loc=rz4&w=*ave&scwo=1&sswo=1

It's conceivable that some statistical machine learning technique (e.g., neural networks) could be applied, with the playouts providing data for the regression.

The more I study this and try different variants, the more impressed I am by RAVE. "Boards after the current board" is a very clever way of defining similarity. Also, recorded RAVE playouts, being stored in each node, expire in an elegant way. It still seems that RAVE fails to exploit some "sibling" information. For example, if I start a playout with black A, white B, and white wins, I should (weakly) consider B as a response to any black first move.

Peter Drake
http://www.lclark.edu/~drake/



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to