Re: [Computer-go] AlphaGo MCTS & Reinforcement Learning?

Petr Baudis Sun, 31 Jan 2016 08:43:02 -0800

On Sun, Jan 31, 2016 at 03:20:16PM +0000, Greg Schmidt wrote:
> The articles I've read so far about AlphaGo mention both MCTS and 
> RL/Q-Learning.  Since MCTS (and certainly UCT) keeps statistics on wins and 
> propagates that information up the tree, that in and of itself would seem to 
> constitute RL, so how does it make sense to have both?  It seems redundant to 
> me.  Any thoughts on that?


I agree with Alvaro's suggestion. :-)  But since the general notion is
interesting and maybe worth re-iterating [1]:

  * MCTS can be concaptualized as *on-the fly machine learning* that
uses RL to learn which actions are how good in the current context
(or a wider class of contexts in case of AMAF).

  * AlphaGo uses machine learning also in a preparation stage where
reinforcement learning is used to find values of actions in fuzzy
defined contexts.  This is then used as a prior for initializing the
action values as they are "tuned on-the-fly" during actual MCTS game
search.

  On context: in classic MCTS, the context is "this precise board
position", we are trying to figure out the action (move) value for
each context separately and independently. [2]  However, this is pretty
wasteful, so we also use localized contexts (based on B-T patterns for
example, or trivial tactics like atari) in real engines as priors for
these action values.

  The value network provides another way to map context to action move
values, and can be thought of imho pretty accurately as a "smart cache"
for the playout-computed action values.  It's "smart" because ANNs are
fuzzy computational engines that do not require a precisely matched
board position but will learn things like "in contexts with this corner
shape, this vital point move will have high action value".


  [1] BTW, there has been a good influx of new mailing list
subscriptions in the last few days!

  [2] AMAF is then a common prefix context, but let's ignore it as
AlphaGo doesn't use it.

-- 
                                Petr Baudis
        If you have good ideas, good data and fast computers,
        you can do almost anything. -- Geoffrey Hinton
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo MCTS & Reinforcement Learning?

Reply via email to