My guess is that there is some kind of threshold depending on the
relative strength of MC eval and the value function of the NN.
If the value function is stronger than MC eval I would guess MCEval
turns into a bad noisy feature with little benefit.
Depending on how strong MC eval is this threshold is probably very
different between engines. Also i can imagine that NN value function can
have some gaping holes in its knowledge that even simple MC eval can
patch up. Probably true for supervised learning where training data
probably has a lot of holes since bad moves are not in the data.
The Zero approach is different because it should converge to perfection
in the limit, thus overcome any weaknesses of the value function early
on. At least in theory.
On 2018-03-05 14:04, Gian-Carlo Pascutto wrote:
On 5/03/2018 12:28, valky...@phmp.se wrote:
Remi twittered more details here (see the discussion with gghideki:
https://twitter.com/Remi_Coulom/status/969936332205318144
Thank you. So Remi gave up on rollouts as well. Interesting "difference
of opinion" there with Zen.
Last time I tested this in regular Leela, playouts were beneficial, but
this was before combined value+policy nets and much more training data
was available. I do not know what the current status would be.
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go