However, the formula in the AGZ paper doesn't look like any "UCT variant". 
Formula from paper: Cpuct * P(s,a) * sqrt(Sum(N(s,b))) / (1 + N(s,a)) Note that 
there is no logarithmic term, and the division by N+1 falls outside the sqrt. 
For comparison, a normal UCT term looks like sqrt(ln(sum(N(s,b))) / (1 + N))

Since I asked my question, I found that other people have also noticed a 
discrepancy. I saw a post on a DeepChem board about this subject. I also found 
a paper 
(https://webdocs.cs.ualberta.ca/~mmueller/ps/2016/2016-Integrating-Factorization-Ranked.pdf)
 by our old friends Chenjun Xiao and Martin Muller: 

    "We apply a variant of PUCT [11] formula which is used in AlphaGo [12] to 
integrate FBT knowledge in MCTS. ...." But then the formula that they give 
differs: argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a)))

I am guessing that Chenjun and Martin decided (or knew) that the AGZ paper was 
incorrect and modified the equation accordingly.

Anyone remember anything about this?

-----Original Message-----
From: Computer-go <computer-go-boun...@computer-go.org> On Behalf Of Gian-Carlo 
Pascutto
Sent: Friday, March 9, 2018 4:48 AM
To: computer-go@computer-go.org
Subject: Re: [Computer-go] PUCT formula

On 08-03-18 18:47, Brian Sheppard via Computer-go wrote:
> I recall that someone investigated this question, but I don’t recall 
> the result. What is the formula that AGZ actually uses?

The one mentioned in their paper, I assume.

I investigated both that and the original from the referenced paper, but after 
tuning I saw little meaningful strength difference.

One thing of note is that (IIRC) the AGZ formula keeps scaling the exploration 
term by the policy prior forever. In the original formula, it is a diminishing 
term.

--
GCP
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to