> .. I think people (me included) feel that replacing a whole swath > of relevant information by a single number points to potentially some > serious inefficiency and loss of information. The fact that nobody > has found how to make use of the excess of information is no proof of > course that it can't be done. I think it's a very valid question at > the very least and a bit premature to call it a wrong premise. MCTS > programs are still in their early days of development and it's quite > possible some good improvements can be made by using more information > than the simple win/loss ratio.
First, let me state that I agree: there should be ways to get more. In fact, my impression is that there have been successes in making better use of simulation results (ownership maps, territory heuristic). Just that they use even more detail (final board vs final score), which makes it easier to extract information that is safe only over a large number of runs. Just looking at the sum score can hide the details that lead to statistically relevant information. One might look at the distribution of sum scores, though: for instance, if scores are either drastic or close, with little in-between, one might assume that there is a pivot move (combination) to be searched for on which the game outcome hinges in the current position (eg, a large group in danger). Second, having now looked at some more random "light playouts" (just instrument your engine to output sgf before starting the next run), I feel that the name is highly misleading. These simulation runs have very little in common with actual play, eg, in a 19x19 run from an empty board, one might see an opening period, with 200 moves of randomly placed stones, followed by a middle period, with 100 or so moves occasionally making a firm connection or a true single-space eye, followed by an endgame with 100-200 moves exploring possible consequences of those little bits of random mess that cannot be ruined by random play. You can perhaps predict which side has the more-likely-connected strings and the more-likely-unkillable strings at 250 or 300 moves, and hence, which side is likely to win the "playout" in the end, but again those properties have very little in common with good shape in actual play. Of course, the outcome is supposed to be nearly evenly random from an empty board, but if you look at move 300 and try to compare the strings and eyes that almost make it vs those that do, it drives home the message that the individual run is nearly meaningless, and the simulation is blind to many "obvious" features of a game position. Upping the number of simulations escalates some of those "nearly"s to significance, but care is needed to decide between significant result and significant error, and how to interpret either. So I now prefer to call these simulation runs "legal fill-ins" and I think it would be worthwhile trying to improve our knowledge on what kind of information can be safely extracted from (sets of) them. Even the one-bit win/loss information appears to depend on (a) undecided areas being filled in such a way that they are allocated evenly to both sides and (b) decided areas and influence on both sides having an equal share of ruinable and non-ruinable aspects. This way, a Murphy-style (whatever can go wrong, will) random fill-in will neither see unrealistic advantages emerge in undecided areas nor reduce one side's advantages more strongly than the other one's, so even if the random scores bear little relation to the realistic scores, the win/loss bit would still be useable, at least over large enough samples. Similarly, if very nearly all random fill-ins in a large enough sample agree on the territorial status of an intersection, that might seem another safe bit of information to extract. But that may not be entirely accurate, and "it works most of the time" is quite different from "statistical analysis shows an error rate of E% for information X after N simulation runs" (*). Even the latter leaves enough room both for interpretation and for significant surprises in actual play. Below is another of those odd examples that you might want to run through your Monte-Carlo evaluation engine (mentally or computer-based:-). I'd be interested to see the results, and how they vary with number of simulations. Assuming no komi, Chinese count is 40/40, so whoever fills the center wins. In a simulation-based evaluation, random invasions are impossible in the black side, but only highly unlikely in the white side. Perhaps someone else can construct an example where the difference is actually significant? If anything, white has been more efficient in building (four moves less, as apparent in Japanese count), so one could say that simulation-based scoring based on naive play is slightly biased toward inefficient play, because it sees most clearly those features that cannot be ruined by Murphy-style play. Simulation-based evaluation does not see the same board position we see, and if the evaluations have anything in common, that is because of assumptions like (a)/(b) above. If those assumptions are violated, all bets are off, so it would be good to have as complete a catalogue of such assumptions as possible. Has there been any work in this direction? Claus (*) Since statistics seem to be the only thing that saves simulation runs and experiment-based reasoning from irrelevance: could anyone here please suggest a good book or online tutorial for those who, like myself, would like a refresher on the relevant basic aspects of statistics? ( ; FF[4] GM[1] SZ[9] AP[Jago:Version 5.0] AB[ab][ba][ad][af][ah][bi][bg][be][bc][cb][cd][cf][ch][da][dc][de][dg][di][ef][eh][eg][ei][db][dd][df][dh] TB[aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][eb][ec][ed][ee][ef][eg][eh][ei][fa][fb][fc][fd][fe][ff][fg][fh][fi][ga][gb][gc][gd][ge][gf][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][ea][ec][ee][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][aa][ac][ae][ag][ai] [bb][bd][bf][bh][ca][cc][ce][cg][ci][db][dd][df][dh][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci][aa][ac][ae][ag][ai][bb][bd][bf][bh][ca][cc][ce][cg][ci] AW[ga][gb][gc][gd][ge][gf][gg][gh][gi][ff][fg][fh][fi][fe][fd][fc][fb][fa][ea][eb][ec][ed] TW[ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][gg][gh][gi][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha][hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii][ha] [hb][hc][hd][he][hf][hg][hh][hi][ia][ib][ic][id][ie][if][ig][ih][ii] GN[fill-in-score] C[ Chinese count: Black: 40, White: 40 Japanese count: Black: 14, White: 18] ) _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/