Thanks Lukasz, For introducing such an interesting paper.
I have a quesion, though. The second algorithm in Figures 1, 2 and 3 is termed UCB2 but is apparently called MOSS in Sections 5 (and 1). Do you know which algorithm is actually used in the numerical experiments? BTW, I guess for MC Go programs, possibly the least "risky" algorithm be the best in practice, isn't it? Hideki ukasz Lew: <capxt8e4pmwmvkiituyhhpbvavgeupgqlnnodyjoamfgo0uo...@mail.gmail.com>: >KL-UCB algorithm >http://arxiv.org/pdf/1102.2490v4.pdf > >"Thus, KL-UCB is optimal for Bernoulli distributions and strictly dominates >a-UCB for any >bounded reward distributions." >http://www.princeton.edu/~sbubeck/SurveyBCB12.pdf (page 18) -- Hideki Kato <mailto:[email protected]> _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
