[Computer-go] Training the value network (a possibly more efficient approach)

Bo Peng Wed, 11 Jan 2017 00:00:45 -0800

Hi John,

>You say "the perfect policy network can be
>derived from the perfect value network (the best next move is the move
>that maximises the value for the player, if the value function is
>perfect), but not vice versa.", but a perfect policy for both players
>can be used to generate a perfect playout which yields the perfect
>value...
>
>regards,
>-John


Thanks for the comment. I changed the phrases to make it clearer. What I
mean is, defining the perfect value function using the perfect policy
network is less direct, where one has to iterate the policy network and
apply an end-game value function.

So the idea is: 
VALUE => POLICY is very direct
POLICY => VALUE is less direct
which means the value function can be regarded as more essential, and it
can be hefpful to have methods to train the value function directly.

Regards,
Bo


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

[Computer-go] Training the value network (a possibly more efficient approach)

Reply via email to