I read through that paper, but I admit that I didn't really get where the extra
power comes from.
-Original Message-
From: valkyria
To: Computer-go
Sent: Mon, Nov 25, 2019 6:41 am
Subject: [Computer-go] MuZero - new paper from DeepMind
Hi,
if anyone still get email from this list:
D
I remember a scheme (from Dave Dyer, IIRC) that indexed positions based on the
points on which the 20th, 40th, 60th,... moves were made. IIRC it was nearly a
unique key for pro positions.
Best,Brian
-Original Message-
From: Erik van der Werf
To: computer-go
Sent: Tue, Sep 17, 2019 5:5
>> contrary to intuition built up from earlier-generation MCTS programs in Go,
>> putting significant weight on score maximization rather than only
>> win/loss seems to help.
This narrative glosses over important nuances.
Collectively we are trying to find the golden mean of cost efficiency...
Thanks for the explanation. I agree that there is no actual consistency in
exploration terms across historical papers.
I confirmed that the PUCT formulas across the AG, AGZ, and AZ papers are all
consistent. That is unlikely to be an error. So now I am wondering whether the
faster decay is usef
Subject: Re: [Computer-go] PUCT formula
On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall
> the result. What is the formula that AGZ actually uses?
The one mentioned in their paper, I assume.
I investigated both
In the AGZ paper, there is a formula for what they call “a variant of the PUCT
algorithm”, and they cite a paper from Christopher Rosin:
http://gauss.ececs.uc.edu/Workshops/isaim2010/papers/rosin.pdf
But that paper has a formula that he calls the PUCB formula, which incorporates
the priors i
The technique originated with backgammon players in the late 1970's, who would
roll out positions manually. Ron Tiekert (Scrabble champion) also applied the
technique to Scrabble, and I took that idea for Maven. It seemed like people
were using the terms interchangeably.
-Original Message--
regards,
Daniel
On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go
mailto:computer-go@computer-go.org> > wrote:
Training on Stockfish games is guaranteed to produce a blunder-fest, because
there are no blunders in the training set and therefore the p
On Tue, Mar 6, 2018 at 9:41 AM, Brian Sheppard via Computer-go
mailto:computer-go@computer-go.org> > wrote:
Training on Stockfish games is guaranteed to produce a blunder-fest, because
there are no blunders in the training set and therefore the policy network
never learn
Training on Stockfish games is guaranteed to produce a blunder-fest, because
there are no blunders in the training set and therefore the policy network
never learns how to refute blunders.
This is not a flaw in MCTS, but rather in the policy network. MCTS will
eventually search every move in
Seems like extraordinarily fast progress. Great to hear that.
-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of
"Ingo Althöfer"
Sent: Friday, December 29, 2017 12:30 PM
To: computer-go@computer-go.org
Subject: [Computer-go] Project Leela Zero
I agree that having special knowledge for "pass" is not a big compromise, but
it would not meet the "zero knowledge" goal, no?
-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of
Rémi Coulom
Sent: Friday, December 29, 2017 7:50 AM
To: computer-g
>I wouldn't find it so surprising if eventually the 20 or 40 block networks
>develop a set of convolutional channels that traces possible ladders
>diagonally across the board.
Learning the deep tactics is more-or-less guaranteed because of the interaction
between search and evaluation throug
Agreed.
You can push this farther. If we define an “error” as a move that flips the W/L
state of a Go game, then only the side that is currently winning can make an
error. Let’s suppose that 6.5 komi is winning for Black. Then Black can make an
error, and after he does then White can make an
AZ scalability looks good in that diagram, and it is certainly a good start,
but it only goes out through 10 sec/move. Also, if the hardware is 7x better
for AZ than SF, then should we elongate the curve for AZ by 7x? Or compress the
curve for SF by 7x? Or some combination? Or take the data at f
@computer-go.org] On Behalf Of
Gian-Carlo Pascutto
Sent: Thursday, December 7, 2017 8:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a
General Reinforcement Learning Algorithm
On 7/12/2017 13:20, Brian Sheppard via Computer-go wro
o Pascutto
Sent: Thursday, December 7, 2017 4:13 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] Mastering Chess and Shogi by Self-Play with a
General Reinforcement Learning Algorithm
On 06-12-17 22:29, Brian Sheppard via Computer-go wrote:
> The chess result is 64-36: a 100 ra
I see the same dynamics that you do, Darren. The 400-game match always has some
probability of being won by the challenger. It is just much more likely if the
challenger is stronger than the champion.
-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Be
Requiring a margin > 55% is a defense against a random result. A 55% score in a
400-game match is 2 sigma.
But I like the AZ policy better, because it does not require arbitrary
parameters. It also improves more fluidly by always drawing training examples
from the current probability distributi
The chess result is 64-36: a 100 rating point edge! I think the Stockfish open
source project improved Stockfish by ~20 rating points in the last year. Given
the number of people/computers involved, Stockfish’s annual effort level seems
comparable to the AZ effort.
Stockfish is really, reall
, or is it imitating the best known
algorithm inconvenient for your purposes?
Best,
-Chaz
On Sat, Dec 2, 2017 at 7:31 PM, Brian Sheppard via Computer-go
mailto:computer-go@computer-go.org> > wrote:
I implemented the ad hoc rule of not training on positions after the first
pass, and my prog
not to resign to early
(even before not passing)
Le 02/12/2017 à 18:17, Brian Sheppard via Computer-go a écrit :
I have some hard data now. My network’s initial training reached the same
performance in half the iterations. That is, the steepness of skill gain in the
first day of training was
ind of impact it has? It
sounds like you have tried both with and without your ad hoc first pass
approach?
2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go
mailto:computer-go@computer-go.org> >:
I have concluded that AGZ's policy of resigning "lost" ga
Re: [Computer-go] Significance of resignation in AGZ
Brian, do you have any experiments showing what kind of impact it has? It
sounds like you have tried both with and without your ad hoc first pass
approach?
2017-12-01 15:29 GMT-06:00 Brian Sheppard via Computer-go
mailto:computer
I have concluded that AGZ's policy of resigning "lost" games early is somewhat
significant. Not as significant as using residual networks, for sure, but you
wouldn't want to go without these advantages.
The benefit cited in the paper is speed. Certainly a factor. I see two other
advantages.
Fi
State of the art in computer chess is alpha-beta search, but note that the
search is very selective because of "late move reductions."
A late move reduction is to reduce depth for moves after the first move
generated in a node. For example, a simple implementation would be "search the
first mov
I would add that "wild guesses based on not enough info" is an indispensable
skill.
-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of
Hideki Kato
Sent: Thursday, October 26, 2017 10:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-
g] On Behalf Of
Robert Jasiek
Sent: Thursday, October 26, 2017 10:17 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero SGF - Free Use or Copyright?
On 26.10.2017 13:52, Brian Sheppard via Computer-go wrote:
> MCTS is the glue that binds incompatible rules.
This is, h
Robert is right, but Robert seems to think this hasn't been done. Actually
every prominent non-neural MCTS program since Mogo has been based on the exact
design that Robert describes. The best of them achieve somewhat greater
strength than Robert expects.
MCTS is the glue that binds incompatibl
I think it uses the champion network. That is, the training periodically
generates a candidate, and there is a playoff against the current champion. If
the candidate wins by more than 55% then a new champion is declared.
Keeping a champion is an important mechanism, I believe. That creates th
So I am reading that residual networks are simply better than normal
convolutional networks. There is a detailed write-up here:
https://blog.waya.ai/deep-residual-learning-9610bb62c355
Summary: the residual network has a fixed connection that adds (with no
scaling) the output of the previous le
o.org] On Behalf Of
Gian-Carlo Pascutto
Sent: Wednesday, October 18, 2017 5:40 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero
On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> This paper is required reading. When I read this team’s papers, I
> think to my
Some thoughts toward the idea of general game-playing...
One aspect of Go is ideally suited for visual NN: strong locality of reference.
That is, stones affect stones that are nearby.
I wonder whether the late emergence of ladder understanding within AlphaGo Zero
is an artifact of the board re
ednesday, October 18, 2017 4:38 PM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo Zero
On 18/10/2017 22:00, Brian Sheppard via Computer-go wrote:
> A stunning result. The NN uses a standard vision architecture (no Go
> adaptation beyond what is necessary to represent the ga
This paper is required reading. When I read this team’s papers, I think to
myself “Wow, this is brilliant! And I think I see the next step.” When I read
their next paper, they show me the next *three* steps. I can’t say enough good
things about the quality of the work.
A stunning result. The
branching factor...
On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard via Computer-go
mailto:computer-go@computer-go.org> > wrote:
Yes, AlphaGo is brute force.
No it is impossible to solve Go.
Perfect play looks a lot like AlphaGo in that you would not be able to tell the
differenc
its
implementation).
On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go
mailto:computer-go@computer-go.org> > wrote:
I understand why most people are saying that AlphaGo is not brute force,
because it appears to be highly selective. But MCTS is a full width search.
Read
stretch when people said that Deep Blue was a brute-force
searcher. If we apply it to AlphaGo as well, the term just means nothing.
Full-width and brute-force are most definitely not the same thing.
Álvaro.
On Sun, Aug 6, 2017 at 2:20 PM, Brian Sheppard via Computer-go
g all possible candidates for the solution and
checking whether each candidate satisfies the problem's statement."
The whole point of the policy network is to avoid brute-force search, by
reducing the branching factor...
On Sun, Aug 6, 2017 at 10:42 AM, Brian Sheppard
Yes, AlphaGo is brute force.
No it is impossible to solve Go.
Perfect play looks a lot like AlphaGo in that you would not be able to tell the
difference. But I think that AlphaGo still has 0% win rate against perfect play.
My own best guess is that top humans make about 12 errors per game. T
>I haven't tried it, but (with the computer chess hat on) these kind of
>proposals behave pretty badly when you get into situations where your
>evaluation is off and there are horizon effects.
In computer Go, this issue focuses on cases where the initial move ordering is
bad. It isn't so much e
Yes. This is a long-known phenomenon.
I was able to get improvements in Pebbles based on the idea of forgetting
unsuccessful results. It has to be done somewhat carefully, because results
propagate up the tree. But you can definitely make it work.
I recall a paper published on this basis.
>... my value network was trained to tell me the game is balanced at the
>beginning...
:-)
The best training policy is to select positions that correct errors.
I used the policies below to train a backgammon NN. Together, they reduced the
expected loss of the network by 50% (cut the error rate
uter-go.org] On Behalf Of
Gian-Carlo Pascutto
Sent: Monday, May 22, 2017 4:08 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] mini-max with Policy and Value network
On 20/05/2017 22:26, Brian Sheppard via Computer-go wrote:
> Could use late-move reductions to eliminate the hard pruning
Could use late-move reductions to eliminate the hard pruning. Given the
accuracy rate of the policy network, I would guess that even move 2 should be
reduced.
-Original Message-
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of
Hiroshi Yamashita
Sent: Saturday,
attention here and
here and...".
On Apr 18, 2017 6:31 AM, "Brian Sheppard via Computer-go"
mailto:computer-go@computer-go.org> > wrote:
Adding patterns is very cheap: encode the patterns as an if/else tree, and it
is O(log n) to match.
Pattern matching as such d
Adding patterns is very cheap: encode the patterns as an if/else tree, and it
is O(log n) to match.
Pattern matching as such did not show up as a significant component of Pebbles.
But that is mostly because all of the machinery that makes pattern-matching
cheap (incremental updating of 3x3 n
t value, but this doesn't work
well enough to my experience, where I applied softmax. Schradolph experimented
with TD for go in his 1994 paper, where he applied Gibbs sampling for
stochastic move selection, although it wasn't a success for building a strong
go bot.
On Fri, Feb
Neural networks always have a lot of local optima. Simply because they have a
high degree of internal symmetry. That is, you can “permute” sets of
coefficients and get the same function.
Don’t think of starting with expert training as a way to avoid local optima. It
is a way to start trainin
If your database is composed of self-play games, then the likelihood
maximization policy should gain strength rapidly, and there should be a way to
have asymptotic optimality. (That is, the patterns alone will play a perfect
game in the limit.)
Specifically: play self-play games using an asy
be able to
improve the rollouts at some point.
Roel
On 31 January 2017 at 17:21, Brian Sheppard via Computer-go
wrote:
If a "diamond" pattern is centered on a 5x5 square, then you have 13 points.
The diagram below will give the idea.
__+__
_+++_
+
_+++_
__+__
At o
uesday, January 24, 2017 3:05 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?
On 23-01-17 20:10, Brian Sheppard via Computer-go wrote:
> only captures of up to 9 stones can be nakade.
I don't really understand this.
http://senseis.xmp.net/?Stra
r-go@computer-go.org
Subject: Re: [Computer-go] AlphaGo rollout nakade patterns?
On 23-01-17 20:10, Brian Sheppard via Computer-go wrote:
> only captures of up to 9 stones can be nakade.
I don't really understand this.
http://senseis.xmp.net/?StraightThree
Both constructing this shape and pl
A capturing move has a potential nakade if the string that was removed is among
a limited set of possibilities. Probably Alpha Go has a 13-point bounding
region (e.g., the 13-point star) that it uses as a positional index, and
therefore a 8192-sized pattern set will identify all potential nakad
54 matches
Mail list logo