Re: [computer-go] Using playouts for more than position evaluation?

dhillismail Sun, 28 Sep 2008 08:30:53 -0700

I agree with much of what you say (to the degree that anyone needs to "agree" 
with questions).


The discussions on this list dealing with "ownership maps", RAVE and AMAF have 
to do with using additional information from the playouts.

Playouts can't be "unbiased." Picking a move with uniform probability is a bias 
too, and not a good one.

Computer go papers here: http://www.citeulike.org/group/5884/library

- Dave Hillis

-----Original Message-----
From: Claus Reinke <[EMAIL PROTECTED]>
To: computer-go@computer-go.org
Sent: Sun, 28 Sep 2008 10:05 am
Subject: [computer-go] Using playouts for more than position evaluation?



>From browsing Monte-Carlo Go papers (*), I get the impression that random
playouts are used mainly to approximate an evaluation function, determining
some value for board positions arising in more traditional tree search.

    Is that correct? It seems somewhat wasteful to calculate all those possible
    board positions and only take a single value before throwing them away.
    Have there been any attempts to extract other information from the playouts?

    For instance, if an intersection belongs to the same colour in all playouts,
    chances are that it is fairly secure (that doesn't mean one shouldn't play
    there, sacrifices there may have an impact on other intersections).

    Or, if an intersection is black in all playouts won by black, and white in
    all playouts won by white, chances are that it is fairly important to play
    there (since playouts are random, there is no guarantee, but emphasizing
    such intersections, and their ordering, in the top-level tree search seems
    profitable).

Secondly, I have been surprised to see Go knowledge being applied to the
random playouts - doesn't that run the danger of blinding the evaluation
function to border cases? It would seem much safer to me to keep the
random playouts unbiased, but to extract information from them to guide
the top-level tree search. Even the playout termination criterion (not filling
eyes) has to be defined fairly carefully (and there have been variations),
to avoid blinding playouts against sacrifices.

    Since most Go knowledge isn't absolute, but comes with caveats, it would
    seem that any attempt to encode Go knowledge in the playouts is risky
    (mind you, I'm a weak player, so I might be wrong;-). For instance, a
    bamboo joint connects two strings, unless (insert various exceptions here),
    so if you encode a bamboo joint as a firm connection, your playouts include
    a systematic error. Shouldn't the same hold for nearly all Go "rules"?

Thirdly, I have been trying to understand why random playouts work
so 
well for evaluating a game in which there is sometimes a very narrow
path to victory. Naively, it would seem that if there was a position from
which exactly one sequence of moves led to a win, but starting on that
sequence would force the opponent to stay on it, then random playouts
would evaluate that position as lost, even if the forced sequence would
make it a win.

    Is it the full search at the top of the tree that avoids this danger (every
    starting move gets explored, and for the correct starting move, random
    plays are even worse for the opponent than being forced, so the forcing
    sequence will emerge, if slowly and not certainly)? If yes, that would
    explain the "horizon effect", where Monte-Carlo programs with slightly
    deeper non-random search fare better at judging positions and squash
    their opponents even without other improvements.

    It might also explain why bots like Leela sometimes seem overconfident
    of their positions, abandoning local fights before they are entirely stable.
    Such overplay has traditionally been useful in playing against other bots,
    even though it can be punished severly against strong human players. If
    the opponent bot can't see the winning sequence, it may not continue
    the local fight, and if it does continue the local fight with anything but 
the
    optimal move, Leela tends to come back with strong answers, as if it
    could suddenly see the danger. Either way tends to justify Leela's playing
    elsewhere, if only against a bot opponent.

Of course, the third and second issue above are somewhat related:
if incorporating Go knowledge in the playouts is the only way to
avoid missing narrow paths to certain evaluations, one might have
to risk adding such knowledge, even if it boomerangs in other situations
(are ladders one such case, or are they better left to random evaluation?).

Ok, way too many questions already;-) I hope someone has some
answers, even if partial or consisting of references to more papers.

Claus

(*) btw, Comp
uter Go related papers seem to be widely distributed -
    is there a central bibliography that keeps track of papers and urls?




_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Using playouts for more than position evaluation?

Reply via email to