On 03-12-17 21:39, Brian Lee wrote:
> It should default to the Q of the parent node. Otherwise, let's say that
> the root node is a losing position. Upon choosing a followup move, the Q
> will be updated to a very negative value, and that node won't get
> explored again - at least until all 362 top-level children have been
> explored and revealed to have negative values. So without initializing Q
> to the parent's Q, you would end up wasting 362 MCTS iterations.

Note that the same argument could be made for making it 0, which some
people think the AGZ paper implies, so the above can't be the entire
explanation.

That said, empirical testing indicates that initializing Q(s, a) to the
parent is indeed a well performing setting for both strong and weak
policy networks.

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to