Claus,

I think you are raising some very valid questions. I'm a bit ambivalent towards AMAF for very similar reasons. One thing in defense of AMAF though, is that it doesn't necessarily need to make Go-sense to be useful. MC simulations also don't make much Go-sense. For example, moves are generally rewarded highly just because they allow for the possibility of the other side of putting himself into atari with a big group. So they ignore the time dimension almost just as much as AMAF does. Yet evidence is very strong that the random MC playouts yield useful information despite the obvious shortcomings. For now I want to keep an open mind and allow for the possibility that AMAF does yield useful information as well. But I do foresee its usefulness is much more limited. For example, the most obvious way to improve MC playouts is to make the playouts smarter (and heavier) to avoid its pitfalls, like trying to avoid putting yourself into atari with a big group. To have a similar solution for AMAF's shortcomings is not so easy to see.

So possibly the yield of AMAF becomes more and more limited with smarter playouts. It's definitely worth it to keep monitoring that. Just now I ran a test of a few thousand games of just plain UCT against UCT+AMAF using 2,000 simulations. The win-rate was 53.3% vs. 46.6% (± 0.9). So it seems AMAF made things a little worse, not better. But this was the first test-run after I finished implementing it, so I'm actually surprised it came this close. I'm sure I'll have a few bugs to fix before I can cast a final verdict. The other thing different is I now only expand the tree after 10 initial simulations whereas originally I always expanded immediately. It's possible that 2,000 playouts is a bit low to overcome the initial inefficiency of doing 10 simulations before expanding. Lastly there's is the weighting of the RAVE value. Right now I use:
        win/loss + uct-value + rave-value
where the rave-value is the accumulation of all the weighted win/loss ratio where the weight decreases according to the depth in the sequence the move was encountered in the same way it's done in the original ref-bot. This may well be a little too braindead a formula and at some point I'll have to think a bit what to do there.

What I'm going to do next is clean up my new implementation such that I can easily adjust two knobs. One controls the number of simulations before expansion and the other controls the weight of the rave-value in the formula above. In that case I should get exactly the same result as my original plain UCT implementation when I set both to zero. Only after that checks out will I start experimenting with other settings.

        Mark

P.S. I got Don's javabot implementation here: http:// cgos.boardspace.net/public/javabot.zip Or you can get mine here: https://plug-and-go.dev.java.net/servlets/ ProjectProcess?tab=4&stage=1

On 20-nov-08, at 23:34, Claus Reinke wrote:

My first monte carlo programs used ownership info very effectively, but it can be that by using
AMAF this information is used already.

As a relative beginner in these matters, the more I look at AMAF,
the less I like it, and I think that has to do with AMAF ignoring
relative sequencing. By looking at all moves as if they were played
first, it ignores that some moves only make sense after certain other
moves. I haven't yet been able to pin this down with evidence,
because the "the game is lost, all moves are equally bad" effect
tends to obscure it (I guess I need to insert "resign" moves at the
right places to limit the game records to the interesting pieces, even
if that means my bot can't score the result).

If I ignore that the Go board is 2d (dimensions, not dans;-), and think only of time (number of moves), space (board positions), and alternatives (simulation runs), then ownership maps project this 3d spacetime multiverse onto a 2d space histogram (likelihood of each position being black/ white)
while AMAF tries to project these three dimensions onto a 2d time
histogram (likelihood of each move resulting in win/lose).

As the various tweaks (don't count moves by other player, don't count
moves already played, value early moves higher than later moves) and
remaining bot blunders demonstrate, that doesn't quite work right, in
spite of first appearances.

My (as yet unproven) hypothesis is that the issues stem from AMAF
trying to ignore the time dimension of its projection, hence you get
highly rated moves that place themselves into atari (because that would
have been the last, necessary, move to capture the opponent group,
*after* surrounding it), that would be suicide right now (but not at some later point in many playouts), or even on an occupied position (no longer occupied later in many playouts), that try to fill an opponent's eye (which would make sense only if the second, nearly complete eye were prevented first, but that doesn't always happen in the same way, hence lower rating
for each of the possible ways).

Giving priority to early moves alleviates this somewhat, as the experiments show, but then you still don't get perfect loss or win results, even if the
board is effectively owned by one of the players, because the tweaks
have removed some data from the statistics to cover the worst holes.

There should be a variation of AMAF that records move win rates in
move sequences ("these moves are good, but this one always comes
after that one; never try this one unless you are sure you can make that
one as well"), preserving more of the structure in the time dimension
of its projection without just reproducing the whole 3d input data space.

Given that ownership maps don't suffer from this kind of trouble (unless one collapses their information to score, which has been said to perform badly), it is surprising that they are hard to make good use of. It might
just be a matter of asking different questions: AMAF variants try to
answer which move is likely best to make first; ownership maps answer
which positions are likely to be mine in the end. The latter is not as directly useable, at least not as long as we are only asking about the next move.

That continues with UCT: apart from offering scaffolding for introducing
more Go knowledge, it also introduces a way of interleaving use of
playout information with continued generation of it, but it tends to do so
in a way biased towards move sequences. Again, not the kind of
question that ownership maps can offer answers to, but that just means
that there are other questions we are not asking yet.

Perhaps there is a complement to UCT/Amaf that achieves such
interleaving based on territorial information? One could allocate the
first half of simulation runs to acquiring an ownership map, then use
that information to narrow down the second half of simulations to
the more interesting parts of the board (perhaps less than half the
runs are needed to weigh the rest successfully?).

Claus

PS. Is there a location where one can find the current versions of
    the plain Amaf reference bots? I seem to be reaching the stage
    where issues I find are not always in my own code, but I don't
    know whether, eg, JrefBot 081016-2022 is the latest version).



_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to