>Is anyone (besides the authors) doing research based on this? Well, Pebbles does apply reinforcement learning (RL) to improve its playout policy. But not in the manner described in that paper. There are practical obstacles to directly applying that paper.
To directly apply that paper, you must have a "CrazyStone" playout design, wherein you maintain 3x3 neighborhoods around each point. Pebbles has a "Mogo" playout design, where you check for patterns only around the last move (or two). To directly pursue this would require a rewrite. Right now, there is no published evidence that the Mogo design is inferior. In fact, two of the world's best programs use the Mogo design (Mogo, Fuego). So I am unwilling to make that commitment. I would also have to research how to scale that paper to realistic conditions, including 1) 9x9 boards at a minimum. 2) Self-play, instead of assuming an oracle. 3) Playout after a UCT/RAVE search rather than pure MC. 4) Pattern sets that have ~1 million parameters. 5) Pattern sets that have more general geometry than 3x3, perhaps. My guess is that all of these research problems are solvable. But that's a lot of work to do. If I had to face this task list, I would put it off until "later," because there is always an easier way to make progress. _______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/