This is exactly the core problem in the linked issue - normally you would use the TrainValidationSplit or CrossValidator to do hyper-parameter selection using cross-validation. You could tune the factor size, regularization parameter and alpha (for implicit preference data), for example.
Because of the NaN issue you cannot use the cross-validators currently with ALS. So you would have to do it yourself manually (dropping the NaNs from the prediction results as Krishna says). On Mon, 25 Jul 2016 at 11:40 Rohit Chaddha <rohitchaddha1...@gmail.com> wrote: > Hi Krishna, > > Great .. I had no idea about this. I tried your suggestion by using > na.drop() and got a rmse = 1.5794048211812495 > Any suggestions how this can be reduced and the model improved ? > > Regards, > Rohit > > On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar <ksanka...@gmail.com> > wrote: > >> Thanks Nick. I also ran into this issue. >> VG, One workaround is to drop the NaN from predictions (df.na.drop()) and >> then use the dataset for the evaluator. In real life, probably detect the >> NaN and recommend most popular on some window. >> HTH. >> Cheers >> <k/> >> >> On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath < >> nick.pentre...@gmail.com> wrote: >> >>> It seems likely that you're running into >>> https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when >>> the test dataset in the train/test split contains users or items that were >>> not in the training set. Hence the model doesn't have computed factors for >>> those ids, and ALS 'transform' currently returns NaN for those ids. This in >>> turn results in NaN for the evaluator result. >>> >>> I have a PR open on that issue that will hopefully address this soon. >>> >>> >>> On Sun, 24 Jul 2016 at 17:49 VG <vlin...@gmail.com> wrote: >>> >>>> ping. Anyone has some suggestions/advice for me . >>>> It will be really helpful. >>>> >>>> VG >>>> On Sun, Jul 24, 2016 at 12:19 AM, VG <vlin...@gmail.com> wrote: >>>> >>>>> Sean, >>>>> >>>>> I did this just to test the model. When I do a split of my data as >>>>> training to 80% and test to be 20% >>>>> >>>>> I get a Root-mean-square error = NaN >>>>> >>>>> So I am wondering where I might be going wrong >>>>> >>>>> Regards, >>>>> VG >>>>> >>>>> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen <so...@cloudera.com> >>>>> wrote: >>>>> >>>>>> No, that's certainly not to be expected. ALS works by computing a much >>>>>> lower-rank representation of the input. It would not reproduce the >>>>>> input exactly, and you don't want it to -- this would be seriously >>>>>> overfit. This is why in general you don't evaluate a model on the >>>>>> training set. >>>>>> >>>>>> On Sat, Jul 23, 2016 at 7:37 PM, VG <vlin...@gmail.com> wrote: >>>>>> > I am trying to run ml.ALS to compute some recommendations. >>>>>> > >>>>>> > Just to test I am using the same dataset for training using >>>>>> ALSModel and for >>>>>> > predicting the results based on the model . >>>>>> > >>>>>> > When I evaluate the result using RegressionEvaluator I get a >>>>>> > Root-mean-square error = 1.5544064263236066 >>>>>> > >>>>>> > I thin this should be 0. Any suggestions what might be going wrong. >>>>>> > >>>>>> > Regards, >>>>>> > Vipul >>>>>> >>>>> >>>>> >>