A post from Michael Williams led me to review this mail below once
more. I hadn't looked at the code of Don's reference bot very closely
until now and instead relied on the description he gave below:
On 23-okt-08, at 14:29, Don Dailey wrote:
Let me give you a simple example where we set the level to a measly 2
playouts:
So you play 2 "unformly random games" that go like the following.
This
is a nonsense game that probably has illegal moves, but I just made up
the moves for the sake of example:
Black to move:
c3 d4 g5 f6 b2 g7 c3 d4 g7 pass pass - black wins
d4 b2 g7 c6 g5 a5 c3 d5 c7 pass pass - black loses
In the first game black played:
c3, g5, b2, c3, d4
... but we cannot count d4 because white played it first.
.... and we can only count c3 once because black played it twice.
So our statistics so far for black is:
c3 - 1 win 1 game
g5 - 1 win 1 game
b2 - 1 win 1 game
In the second game black played:
d4, g7, g5, c3, c7 - and black played these moves first but lost.
If you combine the statistics you get:
c3 1 win 2 games 0.50
g5 1 win 2 games 0.50
b2 1 win 1 game 1.00
d4 0 wins 1 game 0.00
g7 0 wins 1 game 0.00
c7 0 wins 1 game 0.00
The highest scoring move is b2 so that is what is played.
It's called AMAF because we only care if black played the move, we
don't
care WHEN he played it. So AMAF "cheats" by viewing a single game as
several separate games to get many data points from 1 game instead of
just 1 data point.
Please note that we don't care what WHITE did, we only take statistics
on the moves black made (although we ignore any move that black didn't
play first.) So if white plays e5 and later BLACK plays e5
(presumably
after a capture) then we cannot count e5 because white played it
first.
I think if you study my example you will understand.
So I understand from the above that when a playout leads to a win you
add 1 to the wins.
But in the code you subtract one when it leads to a loss. So doesn't
the actual result statistics in the example above become:
c3 1 win 2 games 0.00
g5 1 win 2 games 0.00
b2 1 win 1 game 1.00
d4 0 wins 1 game -1.00
g7 0 wins 1 game -1.00
c7 0 wins 1 game -1.00
Maybe it doesn't matter and it leads to the same result. But I was
trying to make sense of what Michael wrote in light of what I coded
myself.
Mark
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/