Not a direct answer, but some bit of information: Bandit theory started in the early 1950' by Herbert Robbins (the same Robbins from the 1985 paper). However, he did not prove best possible bounds in the seminal paper.
Ingo. -------- Original-Nachricht -------- > Datum: Wed, 26 Oct 2011 11:23:54 +0200 > Von: Petr Baudis <[email protected]> > An: [email protected] > Betreff: [Computer-go] Multi-armed bandit problem theory > Hi! > > Does anyone have a good source for understanding the theory behind > the multi-armed bandit problem, i.e. the proof behind the exponential > arm play bounds etc.? My only source so far is Auer et al., 2002: > Finite-time Analysis of the Multiarmed Bandit Problem - but I suspect > its description of the original bound is incomplete and/or simplified > with some implicit assumptions (i.e. in case of optimal arm, the bound > would involve division by zero?). > > Everyone refers to Lai & Robbins, 1985 and Agrawal, 1995, but I'm > unable to find these papers anywhere (my university JTOR subscription > somehow magically doesn't seem to cover Agrawal, 1995). I'm hoping > that maybe I could grasp the details if I read those, does anyone have > a copy? > > Thanks, > > -- > Petr "Pasky" Baudis > We live on an island surrounded by a sea of ignorance. As our island > of knowledge grows, so does the shore of our ignorace. -- J. A. Wheeler > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
