When I read about Facebook's DCNN-using go program, I remembered another paper that I'd come across on arxiv, namely "How (not) to train your generative model: scheduled sampling, likelihood, adversary?" by Ferenc Huszar (http://arxiv.org/pdf/1511.05101.pdf).
A lot of that paper went over my head (I am a "half-studied scoundrel" as we say in Norway), but his speculation in the end, I think I sort of got, and it made a lot of sense to me. He argues that which side you approach the K-L divergence from so to say matters for what kind of errors you get when the model, and that when you're generating as opposed to predicting, the goal should be to minimize the K-L divergence from the "other" way. When you're using a DCNN in a go program, you are really doing generating, not prediction, right? You want to generate a good move. A model that generates "flashy" moves that LOOK really strong, but could potentially be very bad, would be a good predictor, but a bad generator. The ideal probability distribution is the distribution of moves a pro would make. But to the degree your model falls short, you want to minimize the chance of making a wildly "un-pro" move, rather than maximizing the chance of making a "pro" move. Since these are probability distributions, those two things are not the same unless your model is perfect (right?). If my understanding is correct (and it's quite possible I'm way off course, I'm an amateur! sorry for wasting your time if so!), then rather than training a move predictor, they should use the adversarial methods which are also in the wind now to train a generative model. -- Harald Korneliussen
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go