Hi John,

>You say "the perfect policy network can be
>derived from the perfect value network (the best next move is the move
>that maximises the value for the player, if the value function is
>perfect), but not vice versa.", but a perfect policy for both players
>can be used to generate a perfect playout which yields the perfect
>value...
>
>regards,
>-John

Thanks for the comment. I changed the phrases to make it clearer. What I
mean is, defining the perfect value function using the perfect policy
network is less direct, where one has to iterate the policy network and
apply an end-game value function.

So the idea is: 
VALUE => POLICY is very direct
POLICY => VALUE is less direct
which means the value function can be regarded as more essential, and it
can be hefpful to have methods to train the value function directly.

Regards,
Bo


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to