Hi Ichikawa san, Thank you for nice explanation. I think your guess is maybe right. And 2018 nature paper might have no mistake.
I had checked carefully both Figure 1. 1. 2017 reaches AlphaGo Lee in 170,000 step. 2018 reaches in 80,000 step. 2. 2017 and 2018 reach "AlphaGo Zero(20 block)" in similar steps. 3. Final strength is similar. So I had thought "If you use 7 times games record, initial learning speed is fast, but final strength is similar.". So maybe they want to say "21 million Training Games is enough." But it is wrong. In Go, if you use all positions from a game, it makes overfitting? And learning will fail? Without symmery-augmented, Go can use only 20 positions from a game. Chess and Shogi is ok. It looks like domain dependent... Thanks, Hiroshi Yamashita
Go version in AlphaZero 2017 finished the training in 34 hours according to Table S3. And it looks like AlphaZero Symmetries in AlphaZero 2018 finished the training in the same time according to Figure S1. So I think that the authors had adopted AlphaZero Symmetries in 2017 paper by mistake and retried the experiment again in 2018 paper. In order to compensate symmetries with real self-plays, they generated 8 times more games and reduced positions per game to 1/8. It is just my guess^^
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go