Re: [Computer-go] AlphaGo Zero

Álvaro Begué Fri, 20 Oct 2017 09:56:19 -0700

When I did something like this for Spanish checkers (training a neural
network to be the evaluation function in an alpha-beta search, without any
human knowledge), I solved the problem of adding game variety by using UCT
for the opening moves. That means that I kept a tree structure with the
opening moves and I used the UCB1 formula to pick the next move as long as
the game was in the tree. Once outside the tree, I used alpha-beta search
to play a normal [very fast] game.


One important characteristic of this UCT opening-book builder is that the
last move inside the tree is basically random, so this explores a lot of
unbalanced positions.

Álvaro.



On Fri, Oct 20, 2017 at 9:23 AM, Petr Baudis <pa...@ucw.cz> wrote:

>   I tried to reimplement the system - in a simplified way, trying to
> find the minimum that learns to play 5x5 in a few thousands of
> self-plays.  Turns out there are several components which are important
> to avoid some obvious attractors (like the network predicting black
> loses on every move from its second game on):
>
>   - disabling resignation in a portion of games is essential not just
>     for tuning resignation threshold (if you want to even do that), but
>     just to correct prediction signal by actual scoring rather than
>     starting to always resign early in the game
>
>   - dirichlet (or other) noise is essential for the network getting
>     looped into the same game - which is also self-reinforcing
>
>   - i have my doubts about the idea of high temperature move choices
>     at the beginning, especially with T=1 ... maybe that's just bad
>     very early in the training
>
> On Thu, Oct 19, 2017 at 02:23:41PM +0200, Petr Baudis wrote:
> >   The order of magnitude matches my parameter numbers.  (My attempt to
> > reproduce a simplified version of this is currently evolving at
> > https://github.com/pasky/michi/tree/nnet but the code is a mess right
> > now.)
>
> --
>                                         Petr Baudis, Rossum
>         Run before you walk! Fly before you crawl! Keep moving forward!
>         If we fail, I'd rather fail really hugely.  -- Moist von Lipwig
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] AlphaGo Zero

Reply via email to