Re: [computer-go] Go/Games with modified scores

Don Dailey Tue, 14 Oct 2008 10:40:20 -0700

On Tue, 2008-10-14 at 16:22 +0100, Claus Reinke wrote:
> > .. I think people (me included) feel that replacing a whole swath
> > of relevant information by a single number points to potentially some
> > serious inefficiency and loss of information. The fact that nobody
> > has found how to make use of the excess of information is no proof of
> > course that it can't be done. I think it's a very valid question at
> > the very least and a bit premature to call it a wrong premise. MCTS
> > programs are still in their early days of development and it's quite
> > possible some good improvements can be made by using more information
> > than the simple win/loss ratio.
> 
> First, let me state that I agree: there should be ways to get more. In
> fact, my impression is that there have been successes in making better
> use of simulation results (ownership maps, territory heuristic). Just that
> they use even more detail (final board vs final score), which makes it
> easier to extract information that is safe only over a large number of
> runs. Just looking at the sum score can hide the details that lead to
> statistically relevant information.
> 
> One might look at the distribution of sum scores, though: for instance,
> if scores are either drastic or close, with little in-between, one might
> assume that there is a pivot move (combination) to be searched for
> on which the game outcome hinges in the current position (eg, a large
> group in danger).


I have improved my simple program by considering what you call the "sum
scores", I assume you mean the average outcome of each point.    It
might be possible to incorporate that somehow into the final score, but
that is different from using "margin of victory scoring" but I guess we
are not talking about that now.   


> 
> Second, having now looked at some more random "light playouts"
> (just instrument your engine to output sgf before starting the next run),
> I feel that the name is highly misleading. These simulation runs have
> very little in common with actual play, eg, in a 19x19 run from an
> empty board, one might see an opening period, with 200 moves of
> randomly placed stones, followed by a middle period, with 100 or
> so moves occasionally making a firm connection or a true single-space
> eye, followed by an endgame with 100-200 moves exploring possible
> consequences of those little bits of random mess that cannot be ruined
> by random play.

I think "playouts" is a very good term and it stuck because it was
immediately intuitive.   There is a reason we prefix the term "light" or
"heavy."  

> 
> You can perhaps predict which side has the more-likely-connected
> strings and the more-likely-unkillable strings at 250 or 300 moves,
> and hence, which side is likely to win the "playout" in the end, but
> again those properties have very little in common with good shape
> in actual play.

"heavy playouts" is the key here.   Heavy playouts try to incorporate
good shape.  

> 
> Of course, the outcome is supposed to be nearly evenly random from
> an empty board, but if you look at move 300 and try to compare the
> strings and eyes that almost make it vs those that do, it drives home
> the message that the individual run is nearly meaningless, and the
> simulation is blind to many "obvious" features of a game position.
> Upping the number of simulations escalates some of those "nearly"s
> to significance, but care is needed to decide between significant result
> and significant error, and how to interpret either.

Agreed. 

You have a chicken and egg problem.   If you want the simulations to
reflect perfect play (not "blind" to anything) then you don't need a
search or even a full playout.  Just generate one move.   That is why
the search component of MCTS is so important.   It's also why the
playouts have to be random, otherwise you can throw out the search
component.  If you have random, you will have error. 

The mogo team determined empirically that the the quality of the
playouts in the game sense does not necessarily reflect the quality of
the final program.   

It seems to be more important to relegate interpretation of the playouts
to the search portion of the program.    Of course there is no question
that how you do the playouts is important. 



> 
> So I now prefer to call these simulation runs "legal fill-ins" and I think
> it would be worthwhile trying to improve our knowledge on what kind
> of information can be safely extracted from (sets of) them.

Yes, and I think this is well understood (that we need to improve our
knowledge) and work is being done in this regard.


> 
> Even the one-bit win/loss information appears to depend on (a) undecided
> areas being filled in such a way that they are allocated evenly to both sides
> and (b) decided areas and influence on both sides having an equal share
> of ruinable and non-ruinable aspects. This way, a Murphy-style (whatever
> can go wrong, will) random fill-in will neither see unrealistic advantages
> emerge in undecided areas nor reduce one side's advantages more strongly
> than the other one's, so even if the random scores bear little relation to the
> realistic scores, the win/loss bit would still be useable, at least over large
> enough samples.

However the "one-bit win/loss" information is the most significant bit,
and as we know not all bits are created equal.   

I agree of course that playouts are subject to error,  so if you look at
margin of victory you are looking at margin of victory generated with
the same error prone device aren't you?  

I think this comes down to information theory,  how to extract more
reliable "scoring information" based on what information is available.
I personally feel that probably 95% of what we need is carried by that
one bit of information.  


I think what makes this difficult is that you want to take the actual
results (generated by flawed playouts) by incorporating extra
information not based on the actual win/loss results, to produce a
number that reflects what the win/loss results really SHOULD HAVE BEEN.
Maybe it can be done, but I think that is going to be a very difficult
task.

The other information you are talking about is used in many programs to
modify and improve the playouts, so it's not like it's being ignored. 




> 
> Similarly, if very nearly all random fill-ins in a large enough sample agree
> on the territorial status of an intersection, that might seem another safe bit
> of information to extract.

Yes, that is being extracted but not applied to individual playout
scores, but to improve the quality of the playouts.    Do you think it
can used to make future playouts return more accurate scores?   This is
a different concept from "margin of victory" of course so again we are
talking about different methods that what started this discussion.

- Don


> 
> But that may not be entirely accurate, and "it works most of the time"
> is quite different from "statistical analysis shows an error rate of E% for
> information X after N simulation runs" (*). Even the latter leaves enough
> room both for interpretation and for significant surprises in actual play.
> 
> Below is another of those odd examples that you might want to run through
> your Monte-Carlo evaluation engine (mentally or computer-based:-). I'd
> be interested to see the results, and how they vary with number of 
> simulations.
> 
> Assuming no komi, Chinese count is 40/40, so whoever fills the center
> wins. In a simulation-based evaluation, random invasions are impossible
> in the black side, but only highly unlikely in the white side. Perhaps someone
> else can construct an example where the difference is actually significant?
> If anything, white has been more efficient in building (four moves less, as
> apparent in Japanese count), so one could say that simulation-based
> scoring based on naive play is slightly biased toward inefficient play,
> because it sees most clearly those features that cannot be ruined by
> Murphy-style play.
> 
> Simulation-based evaluation does not see the same board position we
> see, and if the evaluations have anything in common, that is because of
> assumptions like (a)/(b) above. If those assumptions are violated, all
> bets are off, so it would be good to have as complete a catalogue of
> such assumptions as possible. Has there been any work in this direction?
> 
> Claus
> 
> (*) Since statistics seem to be the only thing that saves simulation runs
>     and experiment-based reasoning from irrelevance: could anyone here
>     please suggest a good book or online tutorial for those who, like myself,
>     would like a refresher on the relevant basic aspects of statistics?
> 
> (
> ;
> FF[4]
> GM[1]
> SZ[9]
> AP[Jago:Version 5.0]
> AB[ab][ba][ad][af][ah][bi][bg][be][bc][cb][cd][cf][ch][da][dc][de][dg][di][ef][eh][eg][ei][db][dd][df][dh]
> TB[aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][eb][ec][ed][ee][ef][eg][eh][ei][fa][fb][fc][fd][fe][ff][fg][fh][fi][ga][gb][gc][gd][ge][gf][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][aa][ac][ae][ag][ai]
>  
> [bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci]
> AW[ga][gb][gc][gd][ge][gf][gg][gh][gi][ff][fg][fh][fi][fe][fd][fc][fb][fa][ea][eb][ec][ed]
> TW[ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha]
>  [hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii]
> GN[fill-in-score]
> C[
> Chinese count:
> Black: 40, White: 40
> Japanese count:
> Black: 14, White: 18]
> )
> 
> 
> 
> 
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

signature.asc
Description: This is a digitally signed message part

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] Go/Games with modified scores

Reply via email to