That all looks correct. On Thu, Dec 15, 2016 at 11:54 PM Manish Tripathi <tr.man...@gmail.com> wrote:
> ok. Thanks. So here is what I understood. > > Input data to Als.fit(implicitPrefs=True) is the actual strengths (count > data). So if I have a matrix of (user,item,views/purchases) I pass that as > the input and not the binarized one (preference). This signifies the > strength. > > 2) Since we also pass the alpha parameter to this Als.fit() method, Spark > internally creates the confidence matrix +1+alpha*input_data or some other > alpha factor. > > 3). The output which it gives is basically a factorization of 0/1 matrix > (binarized matrix from initial input data), hence the output also resembles > the preference matrix (0/1) suggesting the interaction. So typically it > should be between 0-1but if it is negative it means very less > preference/interaction > > *Does all the above sound correct?.* > > If yes, then one last question- > > 1). *For explicit dataset where we don't use implicitPref=True,* the > predicted ratings would be actual ratings like it can be 2.3,4.5 etc and > not the interaction measure. That is because in explicit we are not using > the confidence matrix and preference matrix concept and use the actual > rating data. So any output from Spark ALS for explicit data would be a > rating prediction. > ᐧ > > On Thu, Dec 15, 2016 at 3:46 PM, Sean Owen <so...@cloudera.com> wrote: > > No, input are weights or strengths. The output is a factorization of the > binarization of that to 0/1, not probs or a factorization of the input. > This explains the range of the output. > > > On Thu, Dec 15, 2016, 23:43 Manish Tripathi <tr.man...@gmail.com> wrote: > > when you say *implicit ALS *is* factoring the 0/1 matrix. , are you > saying for implicit feedback algorithm we need to pass the input data as > the preference matrix i.e a matrix of 0 and 1?. * > > Then how will they calculate the confidence matrix which is basically > =1+alpha*count matrix. If we don't pass the actual count of values (views > etc) then how does Spark calculates the confidence matrix?. > > I was of the understanding that input data for als.fit(implicitPref=True) > is the actual count matrix of the views/purchases?. Am I going wrong here > if yes, then how is Spark calculating the confidence matrix if it doesn't > have the actual count data. > > The original paper on which Spark algo is based needs the actual count > data to create a confidence matrix and also needs the 0/1 matrix since the > objective functions uses both the confidence matrix and 0/1 matrix to find > the user and item factors. > ᐧ > > On Thu, Dec 15, 2016 at 3:38 PM, Sean Owen <so...@cloudera.com> wrote: > > No, you can't interpret the output as probabilities at all. In particular > they may be negative. It is not predicting rating but interaction. Negative > means very strongly not predicted to interact. No, implicit ALS *is* > factoring the 0/1 matrix. > > On Thu, Dec 15, 2016, 23:31 Manish Tripathi <tr.man...@gmail.com> wrote: > > Ok. So we can kind of interpret the output as probabilities even though it > is not modeling probabilities. This is to be able to use it for > binaryclassification evaluator. > > So the way I understand is and as per the algo, the predicted matrix is > basically a dot product of user factor and item factor matrix. > > but in what circumstances the ratings predicted can be negative. I can > understand if the individual user factor vector and item factor vector is > having negative factor terms, then it can be negative. But practically does > negative make any sense? AS per algorithm the dot product is the predicted > rating. So rating shouldnt be negative for it to make any sense. Also > rating just between 0-1 is normalised rating? Typically rating we expect to > be like any real value 2.3,4.5 etc. > > Also please note, for implicit feedback ALS, we don't feed 0/1 matrix. We > feed the count matrix (discrete count values) and am assuming spark > internally converts it into a preference matrix (1/0) and a confidence > matrix =1+alpha*count_matrix > > > > > ᐧ > > On Thu, Dec 15, 2016 at 2:56 PM, Sean Owen <so...@cloudera.com> wrote: > > No, ALS is not modeling probabilities. The outputs are reconstructions of > a 0/1 matrix. Most values will be in [0,1], but, it's possible to get > values outside that range. > > On Thu, Dec 15, 2016 at 10:21 PM Manish Tripathi <tr.man...@gmail.com> > wrote: > > Hi > > ran the ALS model for implicit feedback thing. Then I used the .transform > method of the model to predict the ratings for the original dataset. My > dataset is of the form (user,item,rating) > > I see something like below: > > predictions.show(5,truncate=False) > > > Why is the last prediction value negative ?. Isn't the transform method > giving the prediction(probability) of seeing the rating as 1?. I had counts > data for rating (implicit feedback) and for validation dataset I binarized > the rating (1 if >0 else 0). My training data has rating positive (it's > basically the count of views to a video). > > I used following to train: > > * als = ALS(rank=x, maxIter=15, regParam=y, implicitPrefs=True,alpha=40.0)* > > * model=als.fit(self.train)* > > What does negative prediction mean here and is it ok to have that? > ᐧ > > > > >