> .. I think people (me included) feel that replacing a whole swath
> of relevant information by a single number points to potentially some
> serious inefficiency and loss of information. The fact that nobody
> has found how to make use of the excess of information is no proof of
> course that it can't be done. I think it's a very valid question at
> the very least and a bit premature to call it a wrong premise. MCTS
> programs are still in their early days of development and it's quite
> possible some good improvements can be made by using more information
> than the simple win/loss ratio.

First, let me state that I agree: there should be ways to get more. In
fact, my impression is that there have been successes in making better
use of simulation results (ownership maps, territory heuristic). Just that
they use even more detail (final board vs final score), which makes it
easier to extract information that is safe only over a large number of
runs. Just looking at the sum score can hide the details that lead to
statistically relevant information.

One might look at the distribution of sum scores, though: for instance,
if scores are either drastic or close, with little in-between, one might
assume that there is a pivot move (combination) to be searched for
on which the game outcome hinges in the current position (eg, a large
group in danger).

Second, having now looked at some more random "light playouts"
(just instrument your engine to output sgf before starting the next run),
I feel that the name is highly misleading. These simulation runs have
very little in common with actual play, eg, in a 19x19 run from an
empty board, one might see an opening period, with 200 moves of
randomly placed stones, followed by a middle period, with 100 or
so moves occasionally making a firm connection or a true single-space
eye, followed by an endgame with 100-200 moves exploring possible
consequences of those little bits of random mess that cannot be ruined
by random play.

You can perhaps predict which side has the more-likely-connected
strings and the more-likely-unkillable strings at 250 or 300 moves,
and hence, which side is likely to win the "playout" in the end, but
again those properties have very little in common with good shape
in actual play.

Of course, the outcome is supposed to be nearly evenly random from
an empty board, but if you look at move 300 and try to compare the
strings and eyes that almost make it vs those that do, it drives home
the message that the individual run is nearly meaningless, and the
simulation is blind to many "obvious" features of a game position.
Upping the number of simulations escalates some of those "nearly"s
to significance, but care is needed to decide between significant result
and significant error, and how to interpret either.

So I now prefer to call these simulation runs "legal fill-ins" and I think
it would be worthwhile trying to improve our knowledge on what kind
of information can be safely extracted from (sets of) them.

Even the one-bit win/loss information appears to depend on (a) undecided
areas being filled in such a way that they are allocated evenly to both sides
and (b) decided areas and influence on both sides having an equal share
of ruinable and non-ruinable aspects. This way, a Murphy-style (whatever
can go wrong, will) random fill-in will neither see unrealistic advantages
emerge in undecided areas nor reduce one side's advantages more strongly
than the other one's, so even if the random scores bear little relation to the
realistic scores, the win/loss bit would still be useable, at least over large
enough samples.

Similarly, if very nearly all random fill-ins in a large enough sample agree
on the territorial status of an intersection, that might seem another safe bit
of information to extract.

But that may not be entirely accurate, and "it works most of the time"
is quite different from "statistical analysis shows an error rate of E% for
information X after N simulation runs" (*). Even the latter leaves enough
room both for interpretation and for significant surprises in actual play.

Below is another of those odd examples that you might want to run through
your Monte-Carlo evaluation engine (mentally or computer-based:-). I'd
be interested to see the results, and how they vary with number of simulations.

Assuming no komi, Chinese count is 40/40, so whoever fills the center
wins. In a simulation-based evaluation, random invasions are impossible
in the black side, but only highly unlikely in the white side. Perhaps someone
else can construct an example where the difference is actually significant?
If anything, white has been more efficient in building (four moves less, as
apparent in Japanese count), so one could say that simulation-based
scoring based on naive play is slightly biased toward inefficient play,
because it sees most clearly those features that cannot be ruined by
Murphy-style play.

Simulation-based evaluation does not see the same board position we
see, and if the evaluations have anything in common, that is because of
assumptions like (a)/(b) above. If those assumptions are violated, all
bets are off, so it would be good to have as complete a catalogue of
such assumptions as possible. Has there been any work in this direction?

Claus

(*) Since statistics seem to be the only thing that saves simulation runs
    and experiment-based reasoning from irrelevance: could anyone here
    please suggest a good book or online tutorial for those who, like myself,
    would like a refresher on the relevant basic aspects of statistics?

(
;
FF[4]
GM[1]
SZ[9]
AP[Jago:Version 5.0]
AB[ab][ba][ad][af][ah][bi][bg][be][bc][cb][cd][cf][ch][da][dc][de][dg][di][ef][eh][eg][ei][db][dd][df][dh]
TB[aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][eb][ec][ed][ee][ef][eg][eh][ei][fa][fb][fc][fd][fe][ff][fg][fh][fi][ga][gb][gc][gd][ge][gf][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][aa][ac][ae][ag][ai]
 
[bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci]
AW[ga][gb][gc][gd][ge][gf][gg][gh][gi][ff][fg][fh][fi][fe][fd][fc][fb][fa][ea][eb][ec][ed]
TW[ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha]
 [hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii]
GN[fill-in-score]
C[
Chinese count:
Black: 40, White: 40
Japanese count:
Black: 14, White: 18]
)




_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to