Re: Negative values of predictions in ALS.tranform

Sean Owen Fri, 16 Dec 2016 01:03:07 -0800

That all looks correct.

On Thu, Dec 15, 2016 at 11:54 PM Manish Tripathi <tr.man...@gmail.com>
wrote:


> ok. Thanks. So here is what I understood.
>
> Input data to Als.fit(implicitPrefs=True) is the actual strengths (count
> data). So if I have a matrix of (user,item,views/purchases) I pass that as
> the input and not the binarized one (preference). This signifies the
> strength.
>
> 2) Since we also pass the alpha parameter to this Als.fit() method, Spark
> internally creates the confidence matrix +1+alpha*input_data or some other
> alpha factor.
>
> 3). The output which it gives is basically a factorization of 0/1 matrix
> (binarized matrix from initial input data), hence the output also resembles
> the preference matrix (0/1) suggesting the interaction. So typically it
> should be between 0-1but if it is negative it means very less
> preference/interaction
>
> *Does all the above sound correct?.*
>
> If yes, then one last question-
>
> 1). *For explicit dataset where we don't use implicitPref=True,* the
> predicted ratings would be actual ratings like it can be 2.3,4.5 etc and
> not the interaction measure. That is because in explicit we are not using
> the confidence matrix and preference matrix concept and use the actual
> rating data. So any output from Spark ALS for explicit data would be a
> rating prediction.
> ᐧ
>
> On Thu, Dec 15, 2016 at 3:46 PM, Sean Owen <so...@cloudera.com> wrote:
>
> No, input are weights or strengths. The output is a factorization of the
> binarization of that to 0/1, not probs or a factorization of the input.
> This explains the range of the output.
>
>
> On Thu, Dec 15, 2016, 23:43 Manish Tripathi <tr.man...@gmail.com> wrote:
>
> when you say *implicit ALS *is* factoring the 0/1 matrix. , are you
> saying for implicit feedback algorithm we need to pass the input data as
> the preference matrix i.e a matrix of 0 and 1?. *
>
> Then how will they calculate the confidence matrix which is basically
> =1+alpha*count matrix. If we don't pass the actual count of values (views
> etc) then how does Spark calculates the confidence matrix?.
>
> I was of the understanding that input data for als.fit(implicitPref=True)
> is the actual count matrix of the views/purchases?. Am I going wrong here
> if yes, then how is Spark calculating the confidence matrix if it doesn't
> have the actual count data.
>
> The original paper on which Spark algo is based needs the actual count
> data to create a confidence matrix and also needs the 0/1 matrix since the
> objective functions uses both the confidence matrix and 0/1 matrix to find
> the user and item factors.
> ᐧ
>
> On Thu, Dec 15, 2016 at 3:38 PM, Sean Owen <so...@cloudera.com> wrote:
>
> No, you can't interpret the output as probabilities at all. In particular
> they may be negative. It is not predicting rating but interaction. Negative
> means very strongly not predicted to interact. No, implicit ALS *is*
> factoring the 0/1 matrix.
>
> On Thu, Dec 15, 2016, 23:31 Manish Tripathi <tr.man...@gmail.com> wrote:
>
> Ok. So we can kind of interpret the output as probabilities even though it
> is not modeling probabilities. This is to be able to use it for
> binaryclassification evaluator.
>
> So the way I understand is and as per the algo, the predicted matrix is
> basically a dot product of user factor and item factor matrix.
>
> but in what circumstances the ratings predicted can be negative. I can
> understand if the individual user factor vector and item factor vector is
> having negative factor terms, then it can be negative. But practically does
> negative make any sense? AS per algorithm the dot product is the predicted
> rating. So rating shouldnt be negative for it to make any sense. Also
> rating just between 0-1 is normalised rating? Typically rating we expect to
> be like any real value 2.3,4.5 etc.
>
> Also please note, for implicit feedback ALS, we don't feed 0/1 matrix. We
> feed the count matrix (discrete count values) and am assuming spark
> internally converts it into a preference matrix (1/0) and a confidence
> matrix =1+alpha*count_matrix
>
>
>
>
> ᐧ
>
> On Thu, Dec 15, 2016 at 2:56 PM, Sean Owen <so...@cloudera.com> wrote:
>
> No, ALS is not modeling probabilities. The outputs are reconstructions of
> a 0/1 matrix. Most values will be in [0,1], but, it's possible to get
> values outside that range.
>
> On Thu, Dec 15, 2016 at 10:21 PM Manish Tripathi <tr.man...@gmail.com>
> wrote:
>
> Hi
>
> ran the ALS model for implicit feedback thing. Then I used the .transform
> method of the model to predict the ratings for the original dataset. My
> dataset is of the form (user,item,rating)
>
> I see something like below:
>
> predictions.show(5,truncate=False)
>
>
> Why is the last prediction value negative ?. Isn't the transform method
> giving the prediction(probability) of seeing the rating as 1?. I had counts
> data for rating (implicit feedback) and for validation dataset I binarized
> the rating (1 if >0 else 0). My training data has rating positive (it's
> basically the count of views to a video).
>
> I used following to train:
>
> * als = ALS(rank=x, maxIter=15, regParam=y, implicitPrefs=True,alpha=40.0)*
>
> *                model=als.fit(self.train)*
>
> What does negative prediction mean here and is it ok to have that?
> ᐧ
>
>
>
>
>

Re: Negative values of predictions in ALS.tranform

Reply via email to