Hi!
Does anyone have a good source for understanding the theory behind
the multi-armed bandit problem, i.e. the proof behind the exponential
arm play bounds etc.? My only source so far is Auer et al., 2002:
Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect
its description of the original bound is incomplete and/or simplified
with some implicit assumptions (i.e. in case of optimal arm, the bound
would involve division by zero?).
Everyone refers to Lai & Robbins, 1985 and Agrawal, 1995, but I'm
unable to find these papers anywhere (my university JTOR subscription
somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping
that maybe I could grasp the details if I read those, does anyone have
a copy?
Thanks,
--
Petr "Pasky" Baudis
We live on an island surrounded by a sea of ignorance. As our island
of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go