No, input are weights or strengths. The output is a factorization of the binarization of that to 0/1, not probs or a factorization of the input. This explains the range of the output.
On Thu, Dec 15, 2016, 23:43 Manish Tripathi <tr.man...@gmail.com> wrote: > when you say *implicit ALS *is* factoring the 0/1 matrix. , are you > saying for implicit feedback algorithm we need to pass the input data as > the preference matrix i.e a matrix of 0 and 1?. * > > Then how will they calculate the confidence matrix which is basically > =1+alpha*count matrix. If we don't pass the actual count of values (views > etc) then how does Spark calculates the confidence matrix?. > > I was of the understanding that input data for als.fit(implicitPref=True) > is the actual count matrix of the views/purchases?. Am I going wrong here > if yes, then how is Spark calculating the confidence matrix if it doesn't > have the actual count data. > > The original paper on which Spark algo is based needs the actual count > data to create a confidence matrix and also needs the 0/1 matrix since the > objective functions uses both the confidence matrix and 0/1 matrix to find > the user and item factors. > ᐧ > > On Thu, Dec 15, 2016 at 3:38 PM, Sean Owen <so...@cloudera.com> wrote: > > No, you can't interpret the output as probabilities at all. In particular > they may be negative. It is not predicting rating but interaction. Negative > means very strongly not predicted to interact. No, implicit ALS *is* > factoring the 0/1 matrix. > > On Thu, Dec 15, 2016, 23:31 Manish Tripathi <tr.man...@gmail.com> wrote: > > Ok. So we can kind of interpret the output as probabilities even though it > is not modeling probabilities. This is to be able to use it for > binaryclassification evaluator. > > So the way I understand is and as per the algo, the predicted matrix is > basically a dot product of user factor and item factor matrix. > > but in what circumstances the ratings predicted can be negative. I can > understand if the individual user factor vector and item factor vector is > having negative factor terms, then it can be negative. But practically does > negative make any sense? AS per algorithm the dot product is the predicted > rating. So rating shouldnt be negative for it to make any sense. Also > rating just between 0-1 is normalised rating? Typically rating we expect to > be like any real value 2.3,4.5 etc. > > Also please note, for implicit feedback ALS, we don't feed 0/1 matrix. We > feed the count matrix (discrete count values) and am assuming spark > internally converts it into a preference matrix (1/0) and a confidence > matrix =1+alpha*count_matrix > > > > > ᐧ > > On Thu, Dec 15, 2016 at 2:56 PM, Sean Owen <so...@cloudera.com> wrote: > > No, ALS is not modeling probabilities. The outputs are reconstructions of > a 0/1 matrix. Most values will be in [0,1], but, it's possible to get > values outside that range. > > On Thu, Dec 15, 2016 at 10:21 PM Manish Tripathi <tr.man...@gmail.com> > wrote: > > Hi > > ran the ALS model for implicit feedback thing. Then I used the .transform > method of the model to predict the ratings for the original dataset. My > dataset is of the form (user,item,rating) > > I see something like below: > > predictions.show(5,truncate=False) > > > Why is the last prediction value negative ?. Isn't the transform method > giving the prediction(probability) of seeing the rating as 1?. I had counts > data for rating (implicit feedback) and for validation dataset I binarized > the rating (1 if >0 else 0). My training data has rating positive (it's > basically the count of views to a video). > > I used following to train: > > * als = ALS(rank=x, maxIter=15, regParam=y, implicitPrefs=True,alpha=40.0)* > > * model=als.fit(self.train)* > > What does negative prediction mean here and is it ok to have that? > ᐧ > > > >