Re: [Computer-go] Training the value network (a possibly more efficient approach)

Bo Peng Wed, 11 Jan 2017 06:32:37 -0800

It¹s nice to see so many discussions.

Another reason could be that training a good quality v(s) (or V(s)) may
require some different network structures from that of W(s).


Usually it is helpful to have an ensemble of different networks, each
constructed from different principles.

On 1/11/17, 22:19, "Computer-go on behalf of Gian-Carlo Pascutto"
<computer-go-boun...@computer-go.org on behalf of g...@sjeng.org> wrote:
>
>Combining this with Kensuke's comment, I think it might be worth trying
>to train V(s) and W(s) simultaneously, but with V(s) being the linear
>interpolation depending on move number, not the value function (which
>leaves us without a way to play handicap games and a bunch of other
>benefits).
>
>This could reduce overfitting during training, and if we only use W(s)
>during gameplay we still have the "strong signal" advantage.
>
>-- 
>GCP
>_______________________________________________
>Computer-go mailing list
>Computer-go@computer-go.org
>http://computer-go.org/mailman/listinfo/computer-go


_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Training the value network (a possibly more efficient approach)

Reply via email to