>Is anyone (besides the authors) doing research based on this?

Well, Pebbles does apply reinforcement learning (RL) to improve
its playout policy. But not in the manner described in that paper.
There are practical obstacles to directly applying that paper.

To directly apply that paper, you must have a "CrazyStone"
playout design, wherein you maintain 3x3 neighborhoods around
each point. Pebbles has a "Mogo" playout design, where you check
for patterns only around the last move (or two).

To directly pursue this would require a rewrite. Right now, there
is no published evidence that the Mogo design is inferior. In fact,
two of the world's best programs use the Mogo design (Mogo, Fuego).
So I am unwilling to make that commitment.

I would also have to research how to scale that paper to
realistic conditions, including

   1) 9x9 boards at a minimum.
   2) Self-play, instead of assuming an oracle.
   3) Playout after a UCT/RAVE search rather than pure MC.
   4) Pattern sets that have ~1 million parameters.
   5) Pattern sets that have more general geometry than 3x3, perhaps.

My guess is that all of these research problems are solvable. But
that's a lot of work to do. If I had to face this task list, I
would put it off until "later," because there is always an easier
way to make progress.


_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to