On 22/05/2016 23:07, Álvaro Begué wrote:
> Disclaimer: I haven't actually implemented MCTS with NNs, but I have
> played around with both techniques.
> 
> Would it make sense to artificially scale down the values before the
> SoftMax is applied, so the probability distribution is not as
> concentrated, and unlikely moves are not penalized as much?

Not really, you can just treat the values as priors in UCT, and it's a
classical exploration/exploitation dilemma again. No need to fiddle with
the NN itself. You have the search exactly to find whether the NN is
wrong, after all.

Unsurprisingly, self-play favors extreme selectivity, but this does not
hold against other opponents. Wonder if anything went wrong there for CS :-)

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to