There is much high-level data to be found within the MC runs, such as
whether a group is alive or not, etc.

Now, I don't know if it is easy to inject it back within the
simulations.
Another approach (not excluding the first one) would be to gather much
lower-level data.

It's especially sad that the playouts have to discover over and over
again that particular sequences are good or bad; we certainly could
incorporate this knowledge in the following playouts. To know a
sequence, it is sufficient to know which move follows each move. We
could store that in a
way similar to ``All moves as first'' (even if I thought about that
before knowing AMAF).

We could record the answers to each move on the board, with the results,
in the simulations.
Exemple:
on 10000 simulations, 
Black plays B2 6000 times.
     White answers A2 2000 times (1500 wins, 500 losses)
                   C2 3000 times (500 wins, 2500 losses)
                   B3 1000 times (500 wins, 500 losses)

Now, there could be many variants.
The easiest way, and one of the least memory-hungry (already very bad
from that point of view, though), would be to keep after each move how
many times each possible following move has been played, and with what overall 
overall result.
We could also keep tables of the previous move by the same player. A
combination of black's previous move + (white's) last move should be
enough to distinguish many situations.

A more ambitious variant would be to build a tree from each possible
move, similarly to the main tree. In such a case, I gather that nodes
should be expanded after being OFTEN visited.

In both cases, we could/should keep the information from the previous
moves, while letting it dim out slowly. A simple idea would be to
multiply by say .9 or .95 the number of games each move has been
recorded as playing at each stage (or if storage is wins-losses,
multiply this number of losses and wins).

There are many difficulties, especially since we must not use too much
computing power. Notably:

- Globally how to blend this with the choice of moves ? It depends on
  the implementation.
- All moves are not simultaneously available: some might already have
  been played. I don't know if this would really be a handicap.
- A move could be an only move in one situation, while another is the
  only move in another situation, with the same previous moves. I 
  gather this should not be important.
  This rule would encourage using the move corresponding to the more
  frequent situation, but the winning rate would decrease because of the
  other situation, and encouragement should be kept at a reasonable
  level.
- In the case of trees, how do we take into account the sequences in the
  other trees leading to the same place. I mean, suppose we have a tree
  N4-M2-O5-M6 and a tree M2-O5-M6. It might be heavy to update
  everything and keep coherence.
- It's probably much too heavy to check whether a win or a loss occured
  when the move had low or high probability to be chosen.
- We have frequencies and winning rates associated to moves. Either we
  can compute a probabiliy distribution from this data, and blend it in
  some way with the existing one, or we can compute bonus/malus to each
  of the moves appearing, and add that to log_prob of the original
  distribution.
  The problem of the former is: how do we deal with moves that
  seldom/never appear after the previous moves ? We cannot really get
  useful information on them.
  The problem with the latter is that we might end up giving a bonus to
  already much favoured moves.

I think the way we use this information should satisfy to the following
properties:
- If there is an only move, the recommendation should be strong enough
  to mainly overcome all other suggestions. Say chosen 90% of the time.
- Symmetrically, a move that always loses should be banned.
- If a move gets a new victory, it should get slightly more probable.
  Mutatis mutandis, the same for loss.
- Quick to compute ! Ideally using a Bradley-Terry model or something
  like that would be a good idea, but unusable during a game.

In any case, the ``coefficients'' we use should not be updated often.
Maybe once in 1000/10000 simulations if it's quick enough, otherwise once at
the beginning of each move.

If anyone is interested, I can try to go down to specifics, matching as
best I can the existing implementation.
As finding a good mathematical procedure is not easy, I shall not work
on that if it does not interest anyone.

Jonas
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to