The random playouts or even heavy playouts are not intended to emulate a human player. Heikki is exactly right.
It's a crude measurement of how good the position is. The moves in a random playout are horrible and so are the moves in a heavy playout. In fact, improving them arbitrarily will probably hurt the playouts. Random playouts work reasonably well because to a certain extent bad moves cancel each other. Chrilly called this "mutual stupidity" I think. For example, a group of stones may be weak and ultimately lost, but it is still possible to defend that group if the attacker isn't diligent. What happens in the playouts is that the attacker is NOT diligent, but neither is the defender! So the result comes out "correct" anyway. Of course this is not reliable, but it's amazingly good at getting a reasonable picture of what is weak, what is strong and who owns what. You can improve that by adding some knowledge to the playouts, but you must do this with great care. In my example above, let's say you add a piece of knowledge that causes the defender to succeed. You can argue that the playouts "play better go" now, but the conclusion that you come to for the group we are talking about is now wrong. - Don On Sun, 2008-11-16 at 10:08 +0100, Heikki Levanto wrote: > On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: > > Being a computer scientist but new to go, i can grasp some of the theory. > > The question I was trying to get across was: > > > > In a game of self play, if both parties are employing only monte carlo, > > surely its not a good conceptual representation of a human, and if the > > reinforcement learning is based on random simulations wouldnt it be very > > weak when playing a real human? > > > Here is another amateur answering. > > The way I understand it, modern Monte Carlo programs do not even try to > emulate a human player with a random player - obviously that would not work. > > What they do is that they build a quite traditional search tree starting from > the current position. They use a random playout as a crude way to evaluate a > position. Based on this evaluation, they decide which branch of the tree to > expand. > > This is the way I understand the random playouts: If, in a given position, > white is clearly ahead, he will win the game if both parts play perfect > moves. He is also likely to win if both parts play reasonably good moves > (say, like human amateurs), but there is a bit more of a chance that one > player hits upon a good combination which the other misses, so the result is > not quite as reliable. If the playouts are totally random, there is still a > better chance for white to win, because both parts make equally bad moves. > The results have much more variation, of course. So far it does not sound > like a very good proposal, but things change if you consider the facts that > we don't have perfecr oracles, and good humans are slow to play out a > position, and can not be integrated into a computer program. Whereas random > playouts can be done awfully fast, tens of thousands of playouts in a second. > Averaging the reuslts gives a fair indication of who is more likely to win > from that position, just what is needed to decide which part of the search > tree to expand. > > The 'random' playouts are not totally random, they include a minimum of > tactical rules (do not fill own eyes, do not pass as long as there are valid > moves). Even this little will produce a few blind spots, moves that the > playouts can not see, and systematically wrong results. Adding more > go-specific knowledge can make the results much better (more likely to be > right), but can also add some more blind spots. And it costs time, reducing > the number of playouts the program can make. > > Hope that explains something of the mystery > > > Regards > > Heikki >
signature.asc
Description: This is a digitally signed message part
_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/