Re: [Computer-go] mini-max with Policy and Value network

Gian-Carlo Pascutto Mon, 22 May 2017 09:28:14 -0700

On 22-05-17 15:46, Erik van der Werf wrote:
> Oh, haha, after reading Brian's post I guess I misunderstood :-)
> 
> Anyway, LMR seems like a good idea, but last time I tried it (in Migos)
> it did not help. In Magog I had some good results with fractional depth
> reductions (like in Realization Probability Search), but it's a long
> time ago and the engines were much weaker then...


What was generating your probabilities, though? A strong policy DCNN or
something weaker?

ERPS (LMR with fractional reductions based on move probabilities) with
alpha-beta seems very similar to having MCTS with the policy prior being
a factor in the UCT formula. This is what AlphaGo did according to their
2015 paper, so it can't be terrible, but it does mean that you are 100%
blind to something the policy network doesn't see, which seems
worrisome. I think I asked Aja once about what they do with first play
urgency given that the paper doesn't address it - he politely ignored
the question :-)

The obvious defense (when looking at it in alpha-beta formulation) would
be to cap the depth reduction, and (in MCTS/UCT formulation) to cap the
minimum probability. I had no success with this in Go so far.

-- 
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Re: [Computer-go] mini-max with Policy and Value network

Reply via email to