I tried to reimplement the system - in a simplified way, trying to find the minimum that learns to play 5x5 in a few thousands of self-plays. Turns out there are several components which are important to avoid some obvious attractors (like the network predicting black loses on every move from its second game on):
- disabling resignation in a portion of games is essential not just for tuning resignation threshold (if you want to even do that), but just to correct prediction signal by actual scoring rather than starting to always resign early in the game - dirichlet (or other) noise is essential for the network getting looped into the same game - which is also self-reinforcing - i have my doubts about the idea of high temperature move choices at the beginning, especially with T=1 ... maybe that's just bad very early in the training On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote: > The order of magnitude matches my parameter numbers. (My attempt to > reproduce a simplified version of this is currently evolving at > https://github.com/pasky/michi/tree/nnet but the code is a mess right > now.) -- Petr Baudis, Rossum Run before you walk! Fly before you crawl! Keep moving forward! If we fail, I'd rather fail really hugely. -- Moist von Lipwig _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go