Generalizing shoulder-hit moves on lower lines may prefer the move in question.
Hideki Gian-Carlo Pascutto: <df55c9d4-2f0a-d902-af71-7677497fc...@sjeng.org>: >On 23-05-17 17:19, Hideki Kato wrote: >> Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>: >> >>> Now, even the original AlphaGo played moves that surprised human pros >>> and were contrary to established sequences. So where did those come >>> from? Enough computation power to overcome the low probability? >>> Synthesized by inference from the (much larger than mine) policy network? >> >> Demis Hassabis said in a talk: >> After the game with Sedol, the team used "adversarial learning" in >> order to fill the holes in policy net (such as the Sedol's winning >> move in the game 4). > >I said, the "original AlphaGo", i.e. the one used in the match against >Lee Sedol. According to the Nature paper, the policy net was trained >with supervised learning only [1]. And yet... > >In the attached SGF, AlphaGo played P10, which was considered a very >surprising move by all commentators. Presumably, this means it's not >seen in high level human play, and would not get a high rating in the >policy net. I can sort-of confirm this: > >0.295057654 (E13) >...(60 more moves follow)... >0.000011952 (P10) > >So, 0.001% probability. Demis commented that Lee Sedol's winning move in >game 4 was a one in 10 000 move. This is a 1 in 100 000 move. >Differently trained policy nets might rate it a bit higher or lower, but >simply due to the fact that was considered very un-human to do, it seems >unlikely to ever be rated highly by a policy net based on supervised >learning. > >So in AlphaGo's formula, you're dealing with a reduction of the UCT term >by a factor 100 000 plus or minus some order of magnitude. > > D6 -> 1359934 (W: 53.21%) (U: 49.34%) (V: 55.15%: 38918) (N: 6.3%) >PV: D6 F6 E7 F7 C8 B8 D7 B7 E9 C9 F8 H7 H >9 K7 H3 K9 >...many moves... > P10 -> 421 (W: 52.68%) (U: 50.09%) (V: 53.98%: 8) (N: 0.0%) >PV: P10 Q10 P8 Q9 > >Now, of course AlphaGo had a few orders of magnitude more hardware, but >you can see from the above that it's, eh, not easy for P10 to overtake >the top moves here in playout count. > >And yet, that's the move that was played. > >[1] I'm assuming that what played the match corresponds to what they >published there - maybe that is my mistake. I'm not sure I remember the >relevant timeline correctly. -- Hideki Kato <mailto:hideki_ka...@ybb.ne.jp> _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go