IMO, training using only the moves of winners is obviously the practical choice.
Worst case: you "waste" half of your data. But that is actually not a downside
provided that you have lots of data, and as your program strengthens you will
avoid potential data-quality problems.
Asymptotically, yo
On Sun, Dec 11, 2016 at 4:50 PM, Rémi Coulom wrote:
> It makes the policy stronger because it makes it more deterministic. The
> greedy policy is way stronger than the probability distribution.
>
I suspected this is what it was mainly about. Did you run any experiments
to see if that explains th
It makes the policy stronger because it makes it more deterministic. The greedy
policy is way stronger than the probability distribution.
Rémi
- Mail original -
De: "Detlef Schmicker"
À: computer-go@computer-go.org
Envoyé: Dimanche 11 Décembre 2016 11:38:08
Objet: [Computer-go] Some exp
On Sun, Dec 11, 2016 at 8:44 PM, Detlef Schmicker wrote:
> Hi Erik,
>
> as far as I understood it, it was 250ELO in policy network alone ...
Two problems: (1) it is a self-play result, (2) the policy was tested
as a stand-alone player.
A policy trained to win games will beat a policy trained to
Hi Erik,
as far as I understood it, it was 250ELO in policy network alone ...
section
2Reinforcement Learning of Policy Networks
We evaluated the performance of the RL policy network in game play,
sampling each move (...) from its output probability distribution over
actions. When played
Detlef, I think your result makes sense. For games between
near-equally strong players the winning player's moves will not be
much better than the loosing player's moves. The game is typically
decided by subtle mistakes. Even if nearly all my moves are perfect,
just one blunder can throw the game.
I want to share some experience training my policy cnn:
As I wondered, why reinforcement learning was so helpful. I trained
from the Godod database with only using the moves by the winner of
each game.
Interestingly the prediction rate of this moves was slightly higher
(without training, just tak