On 22/05/2016 23:07, Álvaro Begué wrote: > Disclaimer: I haven't actually implemented MCTS with NNs, but I have > played around with both techniques. > > Would it make sense to artificially scale down the values before the > SoftMax is applied, so the probability distribution is not as > concentrated, and unlikely moves are not penalized as much?
Not really, you can just treat the values as priors in UCT, and it's a classical exploration/exploitation dilemma again. No need to fiddle with the NN itself. You have the search exactly to find whether the NN is wrong, after all. Unsurprisingly, self-play favors extreme selectivity, but this does not hold against other opponents. Wonder if anything went wrong there for CS :-) -- GCP _______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go