Re: [Computer-go] mini-max with Policy and Value network

Hideki Kato Tue, 23 May 2017 10:09:38 -0700

Gian-Carlo Pascutto: <0357614a-98b8-6949-723e-e1a849c75...@sjeng.org>:


>Now, even the original AlphaGo played moves that surprised human pros
>and were contrary to established sequences. So where did those come
>from? Enough computation power to overcome the low probability?
>Synthesized by inference from the (much larger than mine) policy network?

Demis Hassabis said in a talk:
After the game with Sedol, the team used "adversarial learning" in 
order to fill the holes in policy net (such as the Sedol's winning 
move in the game 4).

Hideki

-- 
Hideki Kato <mailto:hideki_ka...@ybb.ne.jp>
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

Reply via email to