Hi Hideki,
I think they could have used a rollout policy network (RPN), as described in 
"Convolutional Monte Carlo Rollouts in Go" :https://arxiv.org/abs/1512.03375
and have it trained based on the MCTS outcome, at the same time and in the same 
way as the policy head is trained. This RPN would start playing random rollout, 
then benefit from the policy head training.
This would let as "human knowledge" the mixing factor between rollout and value 
net evaluations. But there is anyway such a mixing factor in Zero training 
pipeline, in the loss function mixing policy and value heads.
Regards,Patrick



Message: 1
Date: Fri, 17 Nov 2017 02:32:29 +0900
From: Hideki Kato <hideki_ka...@ybb.ne.jp>
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Is MCTS needed?
Message-ID: <5a0dcba9.8060%hideki_ka...@ybb.ne.jp>
Content-Type: text/plain; charset=US-ASCII

Hi,

I strongly believe adding rollout makes Zero stronger.  
They removed rollout just to say "no human knowledge".
#Though the number of past moves (16) has been tuned by 
human :).


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to