The articles I've read so far about AlphaGo mention both MCTS and 
RL/Q-Learning.  Since MCTS (and certainly UCT) keeps statistics on wins and 
propagates that information up the tree, that in and of itself would seem to 
constitute RL, so how does it make sense to have both?  It seems redundant to 
me.  Any thoughts on that?
Computer-go mailing list

Reply via email to