On Wed, Dec 06, 2017 at 09:57:42AM -0800, Darren Cook wrote: > > Mastering Chess and Shogi by Self-Play with a General Reinforcement > > Learning Algorithm > > https://arxiv.org/pdf/1712.01815.pdf > > One of the changes they made (bottom of p.3) was to continuously update > the neural net, rather than require a new network to beat it 55% of the > time to be used. (That struck me as strange at the time, when reading > the AlphaGoZero paper - why not just >50%?)
Yes, that also struck me. I think it's good news for the community to see it reported that this works, as it makes the training process much more straightforward. They also use just 800 simulations, another good news. (Both were one of the first tradeoffs I made in Nochi.) Another interesting tidbit: they use the TPUs to also generate the selfplay games. > The AlphaZero paper shows it out-performs AlphaGoZero, but they are > comparing to the 20-block, 3-day version. Not the 40-block, 40-day > version that was even stronger. > > As papers rarely show failures, can we take it to mean they couldn't > out-perform their best go bot, do you think? If so, I wonder how hard > they tried? IMHO the most likely explanation is that this research has been going on for a while and when they started in this direction, that early version was their state-of-art baseline. This kind of chronology, with the 40-block version being almost "a last-minute addition", is imho apparent even in the text of the Nature paper. Also, the 3-day version simply had roughly similar training time available as AlphaZero did. -- Petr Baudis, Rossum Run before you walk! Fly before you crawl! Keep moving forward! If we fail, I'd rather fail really hugely. -- Moist von Lipwig _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go