Claus,
I think you are raising some very valid questions. I'm a bit
ambivalent towards AMAF for very similar reasons. One thing in
defense of AMAF though, is that it doesn't necessarily need to make
Go-sense to be useful. MC simulations also don't make much Go-sense.
For example, moves are generally rewarded highly just because they
allow for the possibility of the other side of putting himself into
atari with a big group. So they ignore the time dimension almost just
as much as AMAF does. Yet evidence is very strong that the random MC
playouts yield useful information despite the obvious shortcomings.
For now I want to keep an open mind and allow for the possibility
that AMAF does yield useful information as well. But I do foresee its
usefulness is much more limited. For example, the most obvious way to
improve MC playouts is to make the playouts smarter (and heavier) to
avoid its pitfalls, like trying to avoid putting yourself into atari
with a big group. To have a similar solution for AMAF's shortcomings
is not so easy to see.
So possibly the yield of AMAF becomes more and more limited with
smarter playouts. It's definitely worth it to keep monitoring that.
Just now I ran a test of a few thousand games of just plain UCT
against UCT+AMAF using 2,000 simulations. The win-rate was 53.3% vs.
46.6% (± 0.9). So it seems AMAF made things a little worse, not
better. But this was the first test-run after I finished implementing
it, so I'm actually surprised it came this close. I'm sure I'll have
a few bugs to fix before I can cast a final verdict. The other thing
different is I now only expand the tree after 10 initial simulations
whereas originally I always expanded immediately. It's possible that
2,000 playouts is a bit low to overcome the initial inefficiency of
doing 10 simulations before expanding. Lastly there's is the
weighting of the RAVE value. Right now I use:
win/loss + uct-value + rave-value
where the rave-value is the accumulation of all the weighted win/loss
ratio where the weight decreases according to the depth in the
sequence the move was encountered in the same way it's done in the
original ref-bot. This may well be a little too braindead a formula
and at some point I'll have to think a bit what to do there.
What I'm going to do next is clean up my new implementation such that
I can easily adjust two knobs. One controls the number of simulations
before expansion and the other controls the weight of the rave-value
in the formula above. In that case I should get exactly the same
result as my original plain UCT implementation when I set both to
zero. Only after that checks out will I start experimenting with
other settings.
Mark
P.S. I got Don's javabot implementation here: http://
cgos.boardspace.net/public/javabot.zip
Or you can get mine here: https://plug-and-go.dev.java.net/servlets/
ProjectProcess?tab=4&stage=1
On 20-nov-08, at 23:34, Claus Reinke wrote:
My first monte carlo programs used ownership info very
effectively, but it can be that by using
AMAF this information is used already.
As a relative beginner in these matters, the more I look at AMAF,
the less I like it, and I think that has to do with AMAF ignoring
relative sequencing. By looking at all moves as if they were played
first, it ignores that some moves only make sense after certain other
moves. I haven't yet been able to pin this down with evidence,
because the "the game is lost, all moves are equally bad" effect
tends to obscure it (I guess I need to insert "resign" moves at the
right places to limit the game records to the interesting pieces, even
if that means my bot can't score the result).
If I ignore that the Go board is 2d (dimensions, not dans;-), and
think
only of time (number of moves), space (board positions), and
alternatives
(simulation runs), then ownership maps project this 3d spacetime
multiverse
onto a 2d space histogram (likelihood of each position being black/
white)
while AMAF tries to project these three dimensions onto a 2d time
histogram (likelihood of each move resulting in win/lose).
As the various tweaks (don't count moves by other player, don't count
moves already played, value early moves higher than later moves) and
remaining bot blunders demonstrate, that doesn't quite work right, in
spite of first appearances.
My (as yet unproven) hypothesis is that the issues stem from AMAF
trying to ignore the time dimension of its projection, hence you get
highly rated moves that place themselves into atari (because that
would
have been the last, necessary, move to capture the opponent group,
*after* surrounding it), that would be suicide right now (but not
at some
later point in many playouts), or even on an occupied position (no
longer
occupied later in many playouts), that try to fill an opponent's
eye (which
would make sense only if the second, nearly complete eye were
prevented
first, but that doesn't always happen in the same way, hence lower
rating
for each of the possible ways).
Giving priority to early moves alleviates this somewhat, as the
experiments
show, but then you still don't get perfect loss or win results,
even if the
board is effectively owned by one of the players, because the tweaks
have removed some data from the statistics to cover the worst holes.
There should be a variation of AMAF that records move win rates in
move sequences ("these moves are good, but this one always comes
after that one; never try this one unless you are sure you can make
that
one as well"), preserving more of the structure in the time dimension
of its projection without just reproducing the whole 3d input data
space.
Given that ownership maps don't suffer from this kind of trouble
(unless
one collapses their information to score, which has been said to
perform
badly), it is surprising that they are hard to make good use of. It
might
just be a matter of asking different questions: AMAF variants try to
answer which move is likely best to make first; ownership maps answer
which positions are likely to be mine in the end. The latter is not
as directly
useable, at least not as long as we are only asking about the next
move.
That continues with UCT: apart from offering scaffolding for
introducing
more Go knowledge, it also introduces a way of interleaving use of
playout information with continued generation of it, but it tends
to do so
in a way biased towards move sequences. Again, not the kind of
question that ownership maps can offer answers to, but that just means
that there are other questions we are not asking yet.
Perhaps there is a complement to UCT/Amaf that achieves such
interleaving based on territorial information? One could allocate the
first half of simulation runs to acquiring an ownership map, then use
that information to narrow down the second half of simulations to
the more interesting parts of the board (perhaps less than half the
runs are needed to weigh the rest successfully?).
Claus
PS. Is there a location where one can find the current versions of
the plain Amaf reference bots? I seem to be reaching the stage
where issues I find are not always in my own code, but I don't
know whether, eg, JrefBot 081016-2022 is the latest version).
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/