>Hi all,
>
>the match was in German TV (ZDF) today, in an 8 second snipet.
>
>It included a scene from the press conference:
>http://www.dgob.de/yabbse/index.php?action=dlattach;topic=6322.0;attach=5391;image
>
>I recognise Cho Chikun in the center and Hideki Kato, second from left.
>Who are the ot
Hi all,
the match was in German TV (ZDF) today, in an 8 second snipet.
It included a scene from the press conference:
http://www.dgob.de/yabbse/index.php?action=dlattach;topic=6322.0;attach=5391;image
I recognise Cho Chikun in the center and Hideki Kato, second from left.
Who are the other three
On 17-11-16 22:38, Hiroshi Yamashita wrote:
> Value Net is 32 Filters, 14 Layers.
> 32 5x5 x1, 32 3x3 x11, 32 1x1 x1, fully connect 256, fully connect tanh 1
I think this should be:
32 5x5 x1, 32 3x3 x11, 1 1x1 x1, fully connect 256, fully connect tanh 1
Else one has a 361 * 32 * 256 layer with 3
You are absolutely right, as I was in understanding RL policy network
mode I thought, everything is about this, sorry
Am 21.11.2016 um 15:22 schrieb Gian-Carlo Pascutto:
> On 20-11-16 11:16, Detlef Schmicker wrote:
>> Hi Hiroshi,
>>
>>> Now I'm making 13x13 selfplay games like AlphaGo paper. 1. ma
Yes, I think the important thing of the value function is to detect
moves that are very bad so that MC-eval does not have to sample more
than once for many variations.
If the evaluation function was trained on pro moves only, it would not
know what a bad move looks like. At least the evaluatio
On 20-11-16 11:16, Detlef Schmicker wrote:
> Hi Hiroshi,
>
>> Now I'm making 13x13 selfplay games like AlphaGo paper. 1. make a
>> position by Policy(SL) probability from initial position. 2. play a
>> move uniformly at random from available moves. 3. play left moves
>> by Policy(RL) to the end. (