From before AlphaGo was announced, I thought the way forward was generating games that play to the bitter end maximizing score, and then using the final ownership as something to predict. I am very glad that someone has had the time to put this idea (and many others!) into practice. Congratulations on a very compelling paper.
Álvaro. On Sun, Mar 3, 2019 at 9:21 PM David Wu <lightvec...@gmail.com> wrote: > > For any interested people on this list who don't follow Leela Zero discussion > or reddit threads: > > I recently released a paper on ways to improve the efficiency of > AlphaZero-like learning in Go. A variety of the ideas tried deviate a little > from "pure zero" (e.g. ladder detection, predicting board ownership), but > still only uses self-play starting from random and with no outside human data. > > Although longer training runs have NOT yet been tested, for reaching up to > about LZ130 strength so far (strong human pro or just beyond it, depending on > hardware), you can speed up the learning to that point by roughly a factor of > 5 at least compared to Leela Zero, and closer to a factor of 30 for merely > reaching the earlier level of very strong amateur strength rather than pro or > superhuman. > > I found some other interesting results, too - for example contrary to > intuition built up from earlier-generation MCTS programs in Go, putting > significant weight on score maximization rather than only win/loss seems to > help. > > Blog post: https://blog.janestreet.com/accelerating-self-play-learning-in-go/ > Paper: https://arxiv.org/abs/1902.10565 > Code: https://github.com/lightvector/KataGo > > > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go