I agree with much of what you say (to the degree that anyone needs to "agree" with questions).
The discussions on this list dealing with "ownership maps", RAVE and AMAF have to do with using additional information from the playouts. Playouts can't be "unbiased." Picking a move with uniform probability is a bias too, and not a good one. Computer go papers here: http://www.citeulike.org/group/5884/library - Dave Hillis -----Original Message----- From: Claus Reinke <[EMAIL PROTECTED]> To: computer-go@computer-go.org Sent: Sun, 28 Sep 2008 10:05 am Subject: [computer-go] Using playouts for more than position evaluation? >From browsing Monte-Carlo Go papers (*), I get the impression that random playouts are used mainly to approximate an evaluation function, determining some value for board positions arising in more traditional tree search. Is that correct? It seems somewhat wasteful to calculate all those possible board positions and only take a single value before throwing them away. Have there been any attempts to extract other information from the playouts? For instance, if an intersection belongs to the same colour in all playouts, chances are that it is fairly secure (that doesn't mean one shouldn't play there, sacrifices there may have an impact on other intersections). Or, if an intersection is black in all playouts won by black, and white in all playouts won by white, chances are that it is fairly important to play there (since playouts are random, there is no guarantee, but emphasizing such intersections, and their ordering, in the top-level tree search seems profitable). Secondly, I have been surprised to see Go knowledge being applied to the random playouts - doesn't that run the danger of blinding the evaluation function to border cases? It would seem much safer to me to keep the random playouts unbiased, but to extract information from them to guide the top-level tree search. Even the playout termination criterion (not filling eyes) has to be defined fairly carefully (and there have been variations), to avoid blinding playouts against sacrifices. Since most Go knowledge isn't absolute, but comes with caveats, it would seem that any attempt to encode Go knowledge in the playouts is risky (mind you, I'm a weak player, so I might be wrong;-). For instance, a bamboo joint connects two strings, unless (insert various exceptions here), so if you encode a bamboo joint as a firm connection, your playouts include a systematic error. Shouldn't the same hold for nearly all Go "rules"? Thirdly, I have been trying to understand why random playouts work so well for evaluating a game in which there is sometimes a very narrow path to victory. Naively, it would seem that if there was a position from which exactly one sequence of moves led to a win, but starting on that sequence would force the opponent to stay on it, then random playouts would evaluate that position as lost, even if the forced sequence would make it a win. Is it the full search at the top of the tree that avoids this danger (every starting move gets explored, and for the correct starting move, random plays are even worse for the opponent than being forced, so the forcing sequence will emerge, if slowly and not certainly)? If yes, that would explain the "horizon effect", where Monte-Carlo programs with slightly deeper non-random search fare better at judging positions and squash their opponents even without other improvements. It might also explain why bots like Leela sometimes seem overconfident of their positions, abandoning local fights before they are entirely stable. Such overplay has traditionally been useful in playing against other bots, even though it can be punished severly against strong human players. If the opponent bot can't see the winning sequence, it may not continue the local fight, and if it does continue the local fight with anything but the optimal move, Leela tends to come back with strong answers, as if it could suddenly see the danger. Either way tends to justify Leela's playing elsewhere, if only against a bot opponent. Of course, the third and second issue above are somewhat related: if incorporating Go knowledge in the playouts is the only way to avoid missing narrow paths to certain evaluations, one might have to risk adding such knowledge, even if it boomerangs in other situations (are ladders one such case, or are they better left to random evaluation?). Ok, way too many questions already;-) I hope someone has some answers, even if partial or consisting of references to more papers. Claus (*) btw, Comp uter Go related papers seem to be widely distributed - is there a central bibliography that keeps track of papers and urls? _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/