Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

Petr Baudis Fri, 11 Mar 2016 02:59:09 -0800

On Fri, Mar 11, 2016 at 09:33:52AM +0100, Robert Jasiek wrote:
> On 11.03.2016 08:24, Huazuo Gao wrote:
> >Points at the center of the board indeed depends on the full board, but
> >points near the edge does not.
> 
> I have been wondering why AlphaGo could improve a lot between the Fan Hui
> and Lee Sedol matches incl. learning sente and showing greater signs of more
> global, more long-term planning. A rumour so far suggests to have used the
> time for more learning, but I'd be surprised if this should have sufficed.


My personal hypothesis so far is that it might - the REINFORCE might
scale amazingly well and just continuous application of it (or possibly
more frequent sampling to get more data points; once per game always
seemed quite conservative to me) could make AlphaGo amazingly strong.
We know that after 30mil. self-play games, the RL value network bumps
the strength by ~450 Elo, but what about after 300mil. self-play games?
(Possibly after training the RL policy further too.)

(My main clue for this was the comment that current AlphaGo self-play
games are already looking quite different from human games.  Another
explanation for that might be that they found a way to replace the SL
policy with RL policy in the tree.)

-- 
                                Petr Baudis
        If you have good ideas, good data and fast computers,
        you can do almost anything. -- Geoffrey Hinton
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo & DCNN: Handling long-range dependency

Reply via email to