6 22:52:31
Objet: Re: [Computer-go] Some experiences with CNN trained on moves by the
winning player
On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom < remi.cou...@free.fr > wrote:
It makes the policy stronger because it makes it more deterministic. The greedy
policy is way stronger than th
n der Werf
Sent: Sunday, December 11, 2016 6:51 AM
To: computer-go
Subject: Re: [Computer-go] Some experiences with CNN trained on moves by the
winning player
Detlef, I think your result makes sense. For games between near-equally strong
players the winning player's moves will not be much
explains the whole effect?
>
> Rémi
>
> - Mail original -
> De: "Detlef Schmicker"
> À: computer-go@computer-go.org
> Envoyé: Dimanche 11 Décembre 2016 11:38:08
> Objet: [Computer-go] Some experiences with CNN trained on moves by the
> winning player
&
r-go] Some experiences with CNN trained on moves by the winning
player
I want to share some experience training my policy cnn:
As I wondered, why reinforcement learning was so helpful. I trained
from the Godod database with only using the moves by the winner of
each game.
Interestingly the predi
On Sun, Dec 11, 2016 at 8:44 PM, Detlef Schmicker wrote:
> Hi Erik,
>
> as far as I understood it, it was 250ELO in policy network alone ...
Two problems: (1) it is a self-play result, (2) the policy was tested
as a stand-alone player.
A policy trained to win games will beat a policy trained to
Hi Erik,
as far as I understood it, it was 250ELO in policy network alone ...
section
2Reinforcement Learning of Policy Networks
We evaluated the performance of the RL policy network in game play,
sampling each move (...) from its output probability distribution over
actions. When played
Detlef, I think your result makes sense. For games between
near-equally strong players the winning player's moves will not be
much better than the loosing player's moves. The game is typically
decided by subtle mistakes. Even if nearly all my moves are perfect,
just one blunder can throw the game.
I want to share some experience training my policy cnn:
As I wondered, why reinforcement learning was so helpful. I trained
from the Godod database with only using the moves by the winner of
each game.
Interestingly the prediction rate of this moves was slightly higher
(without training, just tak