There is much high-level data to be found within the MC runs, such as whether a group is alive or not, etc.
Now, I don't know if it is easy to inject it back within the simulations. Another approach (not excluding the first one) would be to gather much lower-level data. It's especially sad that the playouts have to discover over and over again that particular sequences are good or bad; we certainly could incorporate this knowledge in the following playouts. To know a sequence, it is sufficient to know which move follows each move. We could store that in a way similar to ``All moves as first'' (even if I thought about that before knowing AMAF). We could record the answers to each move on the board, with the results, in the simulations. Exemple: on 10000 simulations, Black plays B2 6000 times. White answers A2 2000 times (1500 wins, 500 losses) C2 3000 times (500 wins, 2500 losses) B3 1000 times (500 wins, 500 losses) Now, there could be many variants. The easiest way, and one of the least memory-hungry (already very bad from that point of view, though), would be to keep after each move how many times each possible following move has been played, and with what overall overall result. We could also keep tables of the previous move by the same player. A combination of black's previous move + (white's) last move should be enough to distinguish many situations. A more ambitious variant would be to build a tree from each possible move, similarly to the main tree. In such a case, I gather that nodes should be expanded after being OFTEN visited. In both cases, we could/should keep the information from the previous moves, while letting it dim out slowly. A simple idea would be to multiply by say .9 or .95 the number of games each move has been recorded as playing at each stage (or if storage is wins-losses, multiply this number of losses and wins). There are many difficulties, especially since we must not use too much computing power. Notably: - Globally how to blend this with the choice of moves ? It depends on the implementation. - All moves are not simultaneously available: some might already have been played. I don't know if this would really be a handicap. - A move could be an only move in one situation, while another is the only move in another situation, with the same previous moves. I gather this should not be important. This rule would encourage using the move corresponding to the more frequent situation, but the winning rate would decrease because of the other situation, and encouragement should be kept at a reasonable level. - In the case of trees, how do we take into account the sequences in the other trees leading to the same place. I mean, suppose we have a tree N4-M2-O5-M6 and a tree M2-O5-M6. It might be heavy to update everything and keep coherence. - It's probably much too heavy to check whether a win or a loss occured when the move had low or high probability to be chosen. - We have frequencies and winning rates associated to moves. Either we can compute a probabiliy distribution from this data, and blend it in some way with the existing one, or we can compute bonus/malus to each of the moves appearing, and add that to log_prob of the original distribution. The problem of the former is: how do we deal with moves that seldom/never appear after the previous moves ? We cannot really get useful information on them. The problem with the latter is that we might end up giving a bonus to already much favoured moves. I think the way we use this information should satisfy to the following properties: - If there is an only move, the recommendation should be strong enough to mainly overcome all other suggestions. Say chosen 90% of the time. - Symmetrically, a move that always loses should be banned. - If a move gets a new victory, it should get slightly more probable. Mutatis mutandis, the same for loss. - Quick to compute ! Ideally using a Bradley-Terry model or something like that would be a good idea, but unusable during a game. In any case, the ``coefficients'' we use should not be updated often. Maybe once in 1000/10000 simulations if it's quick enough, otherwise once at the beginning of each move. If anyone is interested, I can try to go down to specifics, matching as best I can the existing implementation. As finding a good mathematical procedure is not easy, I shall not work on that if it does not interest anyone. Jonas _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/