On Sun, Jan 31, 2016 at 03:20:16PM +0000, Greg Schmidt wrote: > The articles I've read so far about AlphaGo mention both MCTS and > RL/Q-Learning. Since MCTS (and certainly UCT) keeps statistics on wins and > propagates that information up the tree, that in and of itself would seem to > constitute RL, so how does it make sense to have both? It seems redundant to > me. Any thoughts on that?
I agree with Alvaro's suggestion. :-) But since the general notion is interesting and maybe worth re-iterating [1]: * MCTS can be concaptualized as *on-the fly machine learning* that uses RL to learn which actions are how good in the current context (or a wider class of contexts in case of AMAF). * AlphaGo uses machine learning also in a preparation stage where reinforcement learning is used to find values of actions in fuzzy defined contexts. This is then used as a prior for initializing the action values as they are "tuned on-the-fly" during actual MCTS game search. On context: in classic MCTS, the context is "this precise board position", we are trying to figure out the action (move) value for each context separately and independently. [2] However, this is pretty wasteful, so we also use localized contexts (based on B-T patterns for example, or trivial tactics like atari) in real engines as priors for these action values. The value network provides another way to map context to action move values, and can be thought of imho pretty accurately as a "smart cache" for the playout-computed action values. It's "smart" because ANNs are fuzzy computational engines that do not require a precisely matched board position but will learn things like "in contexts with this corner shape, this vital point move will have high action value". [1] BTW, there has been a good influx of new mailing list subscriptions in the last few days! [2] AMAF is then a common prefix context, but let's ignore it as AlphaGo doesn't use it. -- Petr Baudis If you have good ideas, good data and fast computers, you can do almost anything. -- Geoffrey Hinton _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go