Thanks Lukasz,

For introducing such an interesting paper.

I have a quesion, though.  The second algorithm in Figures 1, 2 and 3 
is termed UCB2 but is apparently called MOSS in Sections 5 (and 1).  Do 
you know which algorithm is actually used in the numerical 
experiments?

BTW, I guess for MC Go programs, possibly the least "risky" algorithm be 
the best in practice, isn't it?

Hideki

ukasz Lew: 
<capxt8e4pmwmvkiituyhhpbvavgeupgqlnnodyjoamfgo0uo...@mail.gmail.com>:
>KL-UCB algorithm
>http://arxiv.org/pdf/1102.2490v4.pdf
>
>"Thus, KL-UCB is optimal for Bernoulli distributions and strictly dominates
>a-UCB for any
>bounded reward distributions."
>http://www.princeton.edu/~sbubeck/SurveyBCB12.pdf (page 18)
-- 
Hideki Kato <mailto:[email protected]>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to