Hi Erik,

as far as I understood it, it was 250ELO in policy network alone ...


section

2    Reinforcement Learning of Policy Networks

We evaluated the performance of the RL policy network in game play,
sampling each move (...) from its output probability distribution over
actions.   When played head-to-head,
the RL policy network won more than 80% of games against the SL policy
network.

> W.r.t. AG's reinforcement learning results, as far as I know,
> reinforcement learning was only indirectly helpful. The RL policy net
> performed worse then the SL policy net in the over-all system. Only by
> training the value net to predict expected outcomes from the
> (over-fitted?) RL policy net they got some improvement (or so they
> claim). In essence this just means that RL may have been effective in
> creating a better training set for SL. Don't get me wrong, I love RL,
> but the reason why the RL part was hyped so much is in my opinion more
> related to marketing, politics and personal ego.


Detlef
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to