Hello Sean, Thank you very much for the quick response. That helps me a lot to understand it better !
Best regards, Hiro On Thu, Feb 25, 2016 at 6:59 PM, Sean Owen <so...@cloudera.com> wrote: > This isn't specific to Spark; it's from the original paper. > > alpha doesn't do a whole lot, and it is a global hyperparam. It > controls the relative weight of observed versus unobserved > user-product interactions in the factorization. Higher alpha means > it's much more important to faithfully reproduce the interactions that > *did* happen as a "1", than reproduce the interactions that *didn't* > happen as a "0". > > I don't think there's a good rule of thumb about what value to pick; > it can't be less than 0 (less than 1 doesn't make much sense either), > and you might just try values between 1 and 100 to see what gives the > best result. > > I think that generally sparser input needs higher alpha, and maybe > someone tells me that really alpha should be a function of the > sparsity, but I've never seen that done. > > > > On Thu, Feb 25, 2016 at 6:33 AM, Hiroyuki Yamada <mogwa...@gmail.com> > wrote: > > Hi, I've been doing some POC for CF in MLlib. > > In my environment, ratings are all implicit so that I try to use it with > > trainImplicit method (in python). > > > > The trainImplicit method takes alpha as one of the arguments to specify a > > confidence for the ratings as described in > > <http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html > >, > > but the alpha value is global for all the ratings so I am not sure why we > > need this. > > (If it is per rating, it makes sense to me, though.) > > > > What is the difference in setting different alpha values for exactly the > > same data set ? > > > > I would be very appreciated if someone give me a reasonable explanation > for > > this. > > > > Best regards, > > Hiro >