On Fri, Mar 11, 2016 at 09:33:52AM +0100, Robert Jasiek wrote: > On 11.03.2016 08:24, Huazuo Gao wrote: > >Points at the center of the board indeed depends on the full board, but > >points near the edge does not. > > I have been wondering why AlphaGo could improve a lot between the Fan Hui > and Lee Sedol matches incl. learning sente and showing greater signs of more > global, more long-term planning. A rumour so far suggests to have used the > time for more learning, but I'd be surprised if this should have sufficed.
My personal hypothesis so far is that it might - the REINFORCE might scale amazingly well and just continuous application of it (or possibly more frequent sampling to get more data points; once per game always seemed quite conservative to me) could make AlphaGo amazingly strong. We know that after 30mil. self-play games, the RL value network bumps the strength by ~450 Elo, but what about after 300mil. self-play games? (Possibly after training the RL policy further too.) (My main clue for this was the comment that current AlphaGo self-play games are already looking quite different from human games. Another explanation for that might be that they found a way to replace the SL policy with RL policy in the tree.) -- Petr Baudis If you have good ideas, good data and fast computers, you can do almost anything. -- Geoffrey Hinton _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go