From before AlphaGo was announced, I thought the way forward was
generating games that play to the bitter end maximizing score, and
then using the final ownership as something to predict. I am very glad
that someone has had the time to put this idea (and many others!) into
practice. Congratulations on a very compelling paper.

Álvaro.


On Sun, Mar 3, 2019 at 9:21 PM David Wu <lightvec...@gmail.com> wrote:
>
> For any interested people on this list who don't follow Leela Zero discussion 
> or reddit threads:
>
> I recently released a paper on ways to improve the efficiency of 
> AlphaZero-like learning in Go. A variety of the ideas tried deviate a little 
> from "pure zero" (e.g. ladder detection, predicting board ownership), but 
> still only uses self-play starting from random and with no outside human data.
>
> Although longer training runs have NOT yet been tested, for reaching up to 
> about LZ130 strength so far (strong human pro or just beyond it, depending on 
> hardware), you can speed up the learning to that point by roughly a factor of 
> 5 at least compared to Leela Zero, and closer to a factor of 30 for merely 
> reaching the earlier level of very strong amateur strength rather than pro or 
> superhuman.
>
> I found some other interesting results, too - for example contrary to 
> intuition built up from earlier-generation MCTS programs in Go, putting 
> significant weight on score maximization rather than only win/loss seems to 
> help.
>
> Blog post: https://blog.janestreet.com/accelerating-self-play-learning-in-go/
> Paper: https://arxiv.org/abs/1902.10565
> Code: https://github.com/lightvector/KataGo
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to