Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-13 Thread Rémi Coulom
6 22:52:31 Objet: Re: [Computer-go] Some experiences with CNN trained on moves by the winning player On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom < remi.cou...@free.fr > wrote: It makes the policy stronger because it makes it more deterministic. The greedy policy is way stronger than th

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Brian Sheppard
n der Werf Sent: Sunday, December 11, 2016 6:51 AM To: computer-go Subject: Re: [Computer-go] Some experiences with CNN trained on moves by the winning player Detlef, I think your result makes sense. For games between near-equally strong players the winning player's moves will not be much

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Álvaro Begué
explains the whole effect? > > Rémi > > - Mail original - > De: "Detlef Schmicker" > À: computer-go@computer-go.org > Envoyé: Dimanche 11 Décembre 2016 11:38:08 > Objet: [Computer-go] Some experiences with CNN trained on moves by the > winning player &

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Rémi Coulom
r-go] Some experiences with CNN trained on moves by the winning player I want to share some experience training my policy cnn: As I wondered, why reinforcement learning was so helpful. I trained from the Godod database with only using the moves by the winner of each game. Interestingly the predi

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Erik van der Werf
On Sun, Dec 11, 2016 at 8:44 PM, Detlef Schmicker wrote: > Hi Erik, > > as far as I understood it, it was 250ELO in policy network alone ... Two problems: (1) it is a self-play result, (2) the policy was tested as a stand-alone player. A policy trained to win games will beat a policy trained to

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Detlef Schmicker
Hi Erik, as far as I understood it, it was 250ELO in policy network alone ... section 2Reinforcement Learning of Policy Networks We evaluated the performance of the RL policy network in game play, sampling each move (...) from its output probability distribution over actions. When played

Re: [Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Erik van der Werf
Detlef, I think your result makes sense. For games between near-equally strong players the winning player's moves will not be much better than the loosing player's moves. The game is typically decided by subtle mistakes. Even if nearly all my moves are perfect, just one blunder can throw the game.

[Computer-go] Some experiences with CNN trained on moves by the winning player

2016-12-11 Thread Detlef Schmicker
I want to share some experience training my policy cnn: As I wondered, why reinforcement learning was so helpful. I trained from the Godod database with only using the moves by the winner of each game. Interestingly the prediction rate of this moves was slightly higher (without training, just tak