On 03-12-17 21:39, Brian Lee wrote: > It should default to the Q of the parent node. Otherwise, let's say that > the root node is a losing position. Upon choosing a followup move, the Q > will be updated to a very negative value, and that node won't get > explored again - at least until all 362 top-level children have been > explored and revealed to have negative values. So without initializing Q > to the parent's Q, you would end up wasting 362 MCTS iterations.
Note that the same argument could be made for making it 0, which some people think the AGZ paper implies, so the above can't be the entire explanation. That said, empirical testing indicates that initializing Q(s, a) to the parent is indeed a well performing setting for both strong and weak policy networks. -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go