Hi John, >You say "the perfect policy network can be >derived from the perfect value network (the best next move is the move >that maximises the value for the player, if the value function is >perfect), but not vice versa.", but a perfect policy for both players >can be used to generate a perfect playout which yields the perfect >value... > >regards, >-John
Thanks for the comment. I changed the phrases to make it clearer. What I mean is, defining the perfect value function using the perfect policy network is less direct, where one has to iterate the policy network and apply an end-game value function. So the idea is: VALUE => POLICY is very direct POLICY => VALUE is less direct which means the value function can be regarded as more essential, and it can be hefpful to have methods to train the value function directly. Regards, Bo _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go