Hi, That's an error :). Olivier Cappe (one of the authors) replied very quickly and gave me another link to the correct version, http://jmlr.org/proceedings/papers/v19/garivier11a/garivier11a.pdf . Also note that there is a typo (misplaced inf sign) in Eq. (1) and (2).
Hideki ukasz Lew: <CAPXT8E4ODjD07Qwci+eOuZ-Eozthjpcf2XM=wgMPF-a=re0...@mail.gmail.com>: >On Tue, Jul 23, 2013 at 8:50 AM, Hideki Kato <[email protected]> wrote: > >> Thanks Lukasz, >> >> For introducing such an interesting paper. >> >> I have a quesion, though. The second algorithm in Figures 1, 2 and 3 >> is termed UCB2 but is apparently called MOSS in Sections 5 (and 1). Do >> you know which algorithm is actually used in the numerical >> experiments? >> > >I don't know, but you might mail the author. > > >> >> BTW, I guess for MC Go programs, possibly the least "risky" algorithm be >> the best in practice, isn't it? >> > >I won't speculate. Only experiments can tell. > > >> >> Hideki >> >> ukasz Lew: < >> capxt8e4pmwmvkiituyhhpbvavgeupgqlnnodyjoamfgo0uo...@mail.gmail.com>: >> >KL-UCB algorithm >> >http://arxiv.org/pdf/1102.2490v4.pdf >> > >> >"Thus, KL-UCB is optimal for Bernoulli distributions and strictly >> dominates >> >a-UCB for any >> >bounded reward distributions." >> >http://www.princeton.edu/~sbubeck/SurveyBCB12.pdf (page 18) >> -- >> Hideki Kato <mailto:[email protected]> >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >> -- Hideki Kato <mailto:[email protected]> _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
