; Date: Wednesday, 30 March 2016 at 18:59
> To: "computer-go@computer-go.org"
> Subject: Re: [Computer-go] Congratulations to AlphaGo (Statistical
> significance of results)
>
> Hey Simon,
>
> I only now remembered:
>
> we actually experimented on the effect
&
g>"
mailto:computer-go@computer-go.org>>
Date: Wednesday, 30 March 2016 at 18:59
To: "computer-go@computer-go.org<mailto:computer-go@computer-go.org>"
mailto:computer-go@computer-go.org>>
Subject: Re: [Computer-go] Congratulations to AlphaGo (Statistical significa
Or, if it's lopsided far from 1/2, Wilson's is just as good, in my
experience.
On Mar 30, 2016 10:29 AM, "Olivier Teytaud" wrote:
> don't use asymptotic normality with a sample size 5, use Fisher's exact
> test
>
> the p-value for the rejection of
> "P(alpha-Go wins a given game against Lee Sedol
don't use asymptotic normality with a sample size 5, use Fisher's exact test
the p-value for the rejection of
"P(alpha-Go wins a given game against Lee Sedol)<.5"
might be something like 3/16
(under the "independent coin" assumption!)
this is not 0.05, but still quite an impressive result :-)
wi
Hey Simon,
I only now remembered:
we actually experimented on the effect
of making 1 blunder (random move instead of learned/searched move)
in Go and Hex
"Blunder Cost in Go and Hex"
so this might be a starting point for your question
of measuring player strength by measuring
all move strengths
In my original post I put a link to
the relevant section of the MacKay
book that shows exactly how to calculate
the probability of superiority
assuming the game outcome is modelled as
a biased coin toss:
http://www.inference.phy.cam.ac.uk/itila/
I was making the point that for this
and for o
"do they have positive or negative correlation?" intriguing question,
Petri. Intuitively, we might arbitrarily divide the human population
into two groups; one which is discouraged by failure, and the other
which takes the Lady MacBeth attitude of "screw your courage to the
sticking point, and we
Since there are only two possible outcomes it pretty much normal. Actually
binomial which will converge to normal given enough samples
Only thing that cans distort is that consecutive games are not
independent (which
is probably the case but do they have positive or negative correlation?)
2016-03
I think the error here is that the game outcome is not a normaly distributed
random value.
Dmitry
30.03.2016, 12:57, "djhbrown ." :
> Simon wrote: "I was discussing the results with a colleague outside
> of the Game AI area the other day when he raised
> the question (which applies to nearly all
Simon wrote: "I was discussing the results with a colleague outside
of the Game AI area the other day when he raised
the question (which applies to nearly all sporting events,
given the small sample size involved)
of statistical significance - suggesting that on another week
the result might have b
On 22 March 2016 at 21:43, Darren Cook wrote:
< snip >
>
> C'mon DeepMind, put that same version on KGS, set to only play 9p
> players, with the same time controls, and let's get 40 games to give it
> a proper ranking. (If 5 games against Lee Sedol are useful, 40 games
> against a range of playe
On 23.03.2016 15:32, Petr Baudis wrote:
these are beautiful posts.
https://massgoblog.wordpress.com/2016/03/11/lee-sedols-strategy-and-alphagos-weakness/
Before you become too excited, also read my comments on the commentary:
http://www.lifein19x19.com/forum/viewtopic.php?p=200539#p200539
--
Thank you, these are beautiful posts. I enjoyed very much reading
a writeup by Go professional who also took the effort to understand the
principles behind MCTS programs as well as develop a basic intution of
the gameplay artifacts, strengths and weaknesses of MCTS. It also
nicely describes the
FYI. We have translated 3 posts by Li Zhe 6p into English.
https://massgoblog.wordpress.com/2016/03/11/lee-sedols-strategy-and-alphagos-weakness/
https://massgoblog.wordpress.com/2016/03/11/game-2-a-nobody-could-have-done-a-better-job-than-lee-sedol/
https://massgoblog.wordpress.com/2016/03/15/bef
> ...
> Pro players who are not familiar with MCTS bot behavior will not see this.
I stand by this:
>> If you want to argue that "their opinion" was wrong because they don't
>> understand the game at the level AlphaGo was playing at, then you can't
>> use their opinion in a positive way either.
Hi Darren,
"Darren Cook"
> ... But, there were also numerous moves where
> the 9-dan pros said, that in *their* opinion, the moves were weak/wrong.
> E.g. wasting ko threats for no reason. Moves even a 1p would never make.
>
> If you want to argue that "their opinion" was wrong because they don't
"Lucas, Simon M"
> my point is that I *think* we can say more (for example
> by not treating the outcome as a black-box event,
> but by appreciating the skill of the individual moves)
* Human professional players were full of praise for some of
AlphaGo's moves, for instance move 37 in game 2.
> ... we witnessed hundreds of moves vetted by 9dan players, especially
> Michael Redmond's, where each move was vetted.
This is a promising approach. But, there were also numerous moves where
the 9-dan pros said, that in *their* opinion, the moves were weak/wrong.
E.g. wasting ko threats for no
This is somewhat moot - if any moves had been significantly and obviously
weak to any observers, the results wouldn't have been 4-1.
I.e. One bad move out of 5 games would give roughly the same strength
information as one loss out of 5 games; consider that the kibitzing was
being done in real time
I think you are reinforcing Simon's original point; i.e. using a more fine
grained approach to statically approximate AlphaGo's ELO where fine grained
is degree of vetting per move and/or a series of moves. That is a
substantially larger sample size and each sample will have a pretty high
degree of
Behalf Of
Álvaro Begué
Sent: 22 March 2016 17:21
To: computer-go
Subject: Re: [Computer-go] Congratulations to AlphaGo (Statistical significance
of results)
A very simple-minded analysis is that, if the null hypothesis is that AlphaGo
and Lee Sedol are
equally strong, AlphaGo would do as well
Given the minimal sample size, bothering over this question won't amount to
much. I think the proper response is that no one thought we'd see this
level of play at this point in our AI efforts and point to the fact that we
witnessed hundreds of moves vetted by 9dan players, especially Michael
Redmo
another interesting question is to judge the bot's strength
by watching the facial gestures and body language of Lee Sedol
with each move...
On Tue, Mar 22, 2016 at 11:46 AM, Álvaro Begué
wrote:
>
>
> On Tue, Mar 22, 2016 at 1:40 PM, Nick Wedd wrote:
>
>> On 22 March 2016 at 17:20, Álvaro Begué
On Tue, Mar 22, 2016 at 1:40 PM, Nick Wedd wrote:
> On 22 March 2016 at 17:20, Álvaro Begué wrote:
>
>> A very simple-minded analysis is that, if the null hypothesis is that
>> AlphaGo and Lee Sedol are equally strong, AlphaGo would do as well as we
>> observed or better 15.625% of the time. Tha
games would go.
From: Computer-go [mailto:computer-go-boun...@computer-go.org] On Behalf Of
Álvaro Begué
Sent: 22 March 2016 17:21
To: computer-go
Subject: Re: [Computer-go] Congratulations to AlphaGo (Statistical significance
of results)
A very simple-minded analysis is that, if the null
On 22 March 2016 at 17:20, Álvaro Begué wrote:
> A very simple-minded analysis is that, if the null hypothesis is that
> AlphaGo and Lee Sedol are equally strong, AlphaGo would do as well as we
> observed or better 15.625% of the time. That's a p-value that even social
> scientists don't get exci
A very simple-minded analysis is that, if the null hypothesis is that
AlphaGo and Lee Sedol are equally strong, AlphaGo would do as well as we
observed or better 15.625% of the time. That's a p-value that even social
scientists don't get excited about. :)
Álvaro.
On Tue, Mar 22, 2016 at 12:48 PM
Statistical significance requires a null hypothesis... I think it's
probably easiest to ask the question of if I assume an ELO difference of x,
how likely it's a 4-1 result?
Turns out that 220 to 270 ELO has a 41% chance of that result.
>= 10% is -50 to 670 ELO
>= 1% is -250 to 1190 ELO
My numbers
> I'm not sure if we can say with certainty that AlphaGo is significantly
> better Go player than Lee Sedol at this point. What we can say with
> certainty is that AlphaGo is in the same ballpark and at least roughly
> as strong as Lee Sedol. To me, that's enough to be really huge on its
> own ac
Subject: Re: [Computer-go] Congratulations to AlphaGo (Statistical significance
of results)
> I'm not sure if we can say with certainty that AlphaGo is significantly
> better Go player than Lee Sedol at this point. What we can say with
> certainty is that AlphaGo is in the same ballpar
On Tue, Mar 22, 2016 at 04:00:41PM +, Lucas, Simon M wrote:
> With AlphaGo winning 4 games to 1, from a simplistic
> stats point of view (with the prior assumption of a fair
> coin toss) you'd not be able to claim much statistical
> significance, yet most (me included) believe that
> AlphaGo i
Simon,
There's no argument better than evidence, and no evidence available to us
other than *all* of the games that alphago has played publicly.
Among two humans, a 4-1 result wouldn't indicate any more or less than this
4-1 result, but we'd already have very strong elo-type information about
bot
Hi all,
I was discussing the results with a colleague outside
of the Game AI area the other day when he raised
the question (which applies to nearly all sporting events,
given the small sample size involved)
of statistical significance - suggesting that on another week
the result might have been 4
33 matches
Mail list logo