Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Petr Baudis Fri, 10 Nov 2017 17:05:38 -0800

On Fri, Nov 10, 2017 at 03:40:27PM +0100, Gian-Carlo Pascutto wrote:
> On 10/11/2017 1:47, Petr Baudis wrote:
> 
> >   * AlphaGo used 19 resnet layers for 19x19, so I used 7 layers for 7x7.
> 
> How many filters per layer?


256 like AlphaGo.

> FWIW 7 layer resnet (14 + 2 layers) is still pretty huge - larger than
> the initial AlphaGo. Given the amount of games you have, and the size of
> the board, I would not be surprised if your neural net program is
> "outbooking" the opponent by remembering the sequences rather than
> learning more generic things.
> 
> (But hey, outbooking is learning too!)

I couldn't exclude this, yes.  It would be interesting to try to use the
same convolutions on a bigger board to see if they play shapes and can do
basic tactics.

> >   * The neural network is updated after _every_ game, _twice_, on _all_
> >     positions plus 64 randomly sampled positions from the entire history,
> >     this all done four times - on original position and the three
> >     symmetry flips (but I was too lazy to implement 90\deg rotation).
> 
> The reasoning being to give a stronger and faster reinforcement with the
> latest data?

Yes.

> >   * Value function is trained with cross-entropy rather than MSE,
> >     no L2 regularization, and plain Adam rather than hand-tuned SGD (but
> >     the annealing is reset time by time due to manual restarts of the
> >     script from a checkpoint).
> 
> I never really had good results with Adam and friends compared to SGD
> (even momentum does not always help - but of course it's much faster
> early on).

It has worked great on all my neural models in other tasks - but this is
actually my first neural model for Go. :)

> >   * No resign auto-threshold but it is important to play 25% games
> >     without resigning to escale local "optima".
> 
> This makes sense because both sides will miscount in exactly the same way.

Without this, producing value 1.0 for one color and 0.0 for the other
is a super-strong attractor.

> >   * 1/Temperature is 2 for first three moves.
> >   * Initially I used 1000 "simulations" per move, but by mistake, last
> >     1500 games when the network improved significantly (see below) were
> >     run with 2000 simulations per move.  So that might matter.
> > 
> >   This has been running for two weeks, self-playing 8500 games.  A week
> > ago its moves already looked a bit natural but it was stuck in various
> > local optima.  Three days ago it has beaten GNUGo once across 20 games.
> > Now five times across 20 games - so I'll let it self-play a little longer
> > as it might surpass GNUGo quickly at this point?  Also this late
> > improvement coincides with the increased simulation number.
> 
> The simulation number if one of the big black boxes in this setup, I
> think. If the policy network does not have a strong opinion yet, it
> seems that one has to make it sufficiently bigger than the amount of
> legal moves. If not, first-play-urgency will cause every successor
> position to be evaluated and there's no look ahead, which means MCTS
> can't discover anything.

I don't see how first-play-urgency comes into play.  Initially it'll be
typically noise but that still means growing the tree pretty
asymmetrically.  I saw uniform sampling only in some cases when number
of simulations was << number of children.

> So a few times 361 makes sense for 19x19, but don't ask me why 1600 and
> not 1200 etc.

My feeling now is that especially slightly later on raising the count
really helps.  I think the moment is when you stop seeing regular, large
discrepancies between network predictions and scoring output in very
late endgame.  But it could be an illusion.

> With only 50-ish moves to consider on 7x7, it's interesting that you see
> a big improvement by making it (relatively) much larger than DeepMind did.
> 
> But uh, you're not simply matching it against GNUGo with more
> simulations are you? I mean it would be quite normal to win more when
> searching deeper.

All playtests should have been with 2000 simulations.

-- 
                                        Petr Baudis, Rossum
        Run before you walk! Fly before you crawl! Keep moving forward!
        If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Nochi: Slightly successful AlphaGo Zero replication

Reply via email to