Don Dailey wrote:
I personally have serious doubts about knowledge extraction from human
games, but I hope you have success. I think you can get more from
computer games of strong players even though the level is weaker. Here
is why I say that:
1. A strong computer still plays a lot of good moves - so the delta
between human and computer games is not as high as you think.
2. A certain consistency in computer games that humans don't possess.
3. You have access to the internals, such as a score that quantifies
moves.
My idea combines offline knowledge with MC methods. So there is a lot
of online computer generated knowledge. I don't pretend to make a dumb
savant program, just playing the only move it finds in a joseki database.
But if the program considers the joseki stored in (state, action[i-1])
-> action[i] records. It will immediately start searching a tree whose
first move at all depths is the joseki sequence. If the joseki depends
on a ladder, it will find that ladder much earlier and if it is not good
it will hopefully play something else. At blitz time settings, playing
the wrong joseki is bad but not terrible. That would be about the level
of a human dan making a mistake in a blitz game, still better than current
programs.
In this context, human knowledge produces a priori values for UCT. (The
exact formula of the MC tree search is not yet decided.) There is enough
computer generated knowledge in the search. When the end of the game
is still 200+ moves ahead, the search alone does not find enough
difference between good and not so good moves.
The knowledge, I think, makes sense to extract is:
1. (state, last move) -> (next moves list) records (joseki or places
to play when the last move is somewhere else.)
2. urgency of shapes
3. distribution of statistics (e.g. proportion of times tenuki is played
at move n) that helps making playouts more humanlike while still fast.
I finally abandoned full board Bradley-Terry scored random playouts. I
draw a random number to decide if tenuki should be played, if true I
invent a random move and use it as if it was the previous move. Else,
I play a Bradley-Terry scored random move in the 40 neighbors of the
last move. That is almost immediate in my hologram board system because
I store the mask. On other implementations it may not be so good.
About urgency of shapes:
The urgency of a BT adjusted 40 neighbors shape hits 40% in a 10 M+ sample
with about 160 K patterns. (Overlearning should not be very important
given the lack of degrees of liberty.) It hits 66% with the first 5 moves.
Hits of move 1 = 0.402048 acc = 0.402048
Hits of move 2 = 0.117102 acc = 0.519151
Hits of move 3 = 0.065450 acc = 0.584600
Hits of move 4 = 0.045592 acc = 0.630193
Hits of move 5 = 0.035190 acc = 0.665382
That sounds great news. But there is bad news of course. That doesn't go any
further. To reach 80% it needs 14 moves
Hits of move 14 = 0.006206 acc = 0.801095
And to reach 90% it needs 96 moves!
Hits of move 96 = 0.001323 acc = 0.900994
Shape is a factor that explains, perhaps 2/3 of the moves (shape alone about
40%, but shape "in the right place" a little more than 66%). Not less and
_not more_ than that. It is naive to think that the 12th move in terms of
shape is better than the 50th. But, imagine there are 250 legal moves, the
248th is a really bad move.
Can shape make a difference? I don't know. If its slow, surely not. If it
is almost free in terms of performance as in my board system, I hope it does.
Jacques.
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/