Hi Hideki, I think they could have used a rollout policy network (RPN), as described in "Convolutional Monte Carlo Rollouts in Go" :https://arxiv.org/abs/1512.03375 and have it trained based on the MCTS outcome, at the same time and in the same way as the policy head is trained. This RPN would start playing random rollout, then benefit from the policy head training. This would let as "human knowledge" the mixing factor between rollout and value net evaluations. But there is anyway such a mixing factor in Zero training pipeline, in the loss function mixing policy and value heads. Regards,Patrick
Message: 1 Date: Fri, 17 Nov 2017 02:32:29 +0900 From: Hideki Kato <hideki_ka...@ybb.ne.jp> To: computer-go@computer-go.org Subject: Re: [Computer-go] Is MCTS needed? Message-ID: <5a0dcba9.8060%hideki_ka...@ybb.ne.jp> Content-Type: text/plain; charset=US-ASCII Hi, I strongly believe adding rollout makes Zero stronger. They removed rollout just to say "no human knowledge". #Though the number of past moves (16) has been tuned by human :).
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go