Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Eric Boesch Wed, 06 Dec 2017 18:56:50 -0800

I could be drawing wrong inferences from incomplete information, but as
Darren pointed out, this paper does leave the impression Alpha Zero is not
as strong as the real AlphaGo Zero, in which case it would be clearer to
say so explicitly. Of course the chess and shogi results are impressive
regardless. (In chess, the 28/100 wins is good, but 0 losses is even
better. Entering a drawn sequence starting from an inferior position --
such as playing black -- is a desirable result for even a perfect program
without contempt, so failing to win as black is not a good indicator of
strength.)


Comparing the Elo charts in this new paper and the Nature paper on AlphaGo
Zero, and assigning AlphaGo Lee a reference rating of 0 Elo, it appears
that the order in strength of go play is Alpha Zero (~900 Elo), AlphaGo
Master (~1400 Elo), then the full-strength AlphaGo Zero (~1500 Elo).

I would also think Alpha Zero's 8 hours of training with the help of an
immense network of 5,000 first generation TPUs is more expensive, and only
faster in a strictly chronological sense, than AlphaGo Zero 20-block
3-day's training with 4 second generation TPUs.


On Wed, Dec 6, 2017 at 4:29 PM, Brian Sheppard via Computer-go <
computer-go@computer-go.org> wrote:

> The chess result is 64-36: a 100 rating point edge! I think the Stockfish
> open source project improved Stockfish by ~20 rating points in the last
> year. Given the number of people/computers involved, Stockfish’s annual
> effort level seems comparable to the AZ effort.
>
>
>
> Stockfish is really, really tweaked out to do exactly what it does. It is
> very hard to improve anything about Stockfish. To be clear: I am not
> disparaging the code or people or project in any way. The code is great,
> people are great, project is great. It is really easy to work on Stockfish,
> but very hard to make progress given the extraordinarily fine balance of
> resources that already exists.  I tried hard for about 6 months last year
> without any successes. I tried dozens (maybe 100?) experiments, including
> several that were motivated by automated tuning or automated searching for
> opportunities. No luck.
>
>
>
> AZ would dominate the current TCEC. Stockfish didn’t lose a game in the
> semi-final, failing to make the final because of too many draws against the
> weaker players.
>
>
>
> The Stockfish team will have some self-examination going forward for sure.
> I wonder what they will decide to do.
>
>
>
> I hope this isn’t the last we see of these DeepMind programs.
>
>
>
> *From:* Computer-go [mailto:computer-go-boun...@computer-go.org] *On
> Behalf Of *Richard Lorentz
> *Sent:* Wednesday, December 6, 2017 12:50 PM
> *To:* computer-go@computer-go.org
> *Subject:* Re: [Computer-go] Mastering Chess and Shogi by Self-Play with
> a General Reinforcement Learning Algorithm
>
>
>
> One chess result stood out for me, namely, just how much easier it was for
> AlphaZero to win with white (25 wins, 25 draws, 0 losses) rather than with
> black (3 wins, 47 draws, 0 losses).
>
> Maybe we should not give up on the idea of White to play and win in chess!
>
> On 12/06/2017 01:24 AM, Hiroshi Yamashita wrote:
>
> Hi,
>
> DeepMind makes strongest Chess and Shogi programs with AlphaGo Zero
> method.
>
> Mastering Chess and Shogi by Self-Play with a General Reinforcement
> Learning Algorithm
> https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.
> org_pdf_1712.01815.pdf&d=DwIGaQ&c=Oo8bPJf7k7r_cPTz1JF7vEiFxvFRfQtp-
> j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=w0qxE9GOfBVzqPOT0NBm1nsdQqJMlN
> u40BOCWfsO-gQ&s=dsola-9J77ArHVeuVc0ZCZKn2nJOsjfsnJzPc_MdPDo&e=
>
> AlphaZero(Chess) outperformed Stockfish after 4 hours,
> AlphaZero(Shogi) outperformed elmo after 2 hours.
>
> Search is MCTS.
> AlphaZero(Chess) searches     80,000 positions/sec.
> Stockfish        searches 70,000,000 positions/sec.
> AlphaZero(Shogi) searches     40,000 positions/sec.
> elmo             searches 35,000,000 positions/sec.
>
> Thanks,
> Hiroshi Yamashita
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__
> computer-2Dgo.org_mailman_listinfo_computer-2Dgo&d=DwIGaQ&c=Oo8bPJf7k7r_
> cPTz1JF7vEiFxvFRfQtp-j14fFwh71U&r=i0hg-cKH69CA5MsdosvezQ&m=
> w0qxE9GOfBVzqPOT0NBm1nsdQqJMlNu40BOCWfsO-gQ&s=
> Dflm7ezefzMJ9xLNmNYrSQKWa7qvG9FkzlCHngo_NcY&e=
>
>
>
> _______________________________________________
> Computer-go mailing list
> Computer-go@computer-go.org
> http://computer-go.org/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Reply via email to