I created the following data, data.file 1 1 1 2 1 3 2 4 3 5 4 6 5 7 6 1 7 2 8 8
The following code: def parse_line(line): tokens = line.split(' ') return (int(tokens[0]), int(tokens[1])), 1.0 lines = sc.textFile('./data.file') linesTest = sc.textFile('./data.file') trainingRDD = lines.map(parse_line)\ .map(lambda l: Rating(l[0][0], l[0][1], l[1])) testRDD = linesTest.map(parse_line)\ .map(lambda x: (x[0][0], x[0][1])) rank = 5 numIterations = 5 model = ALS.trainImplicit(ratings=trainingRDD, rank=5, iterations=5) res = model.predictAll(testRDD).collect() for item in res: print item produces the following output: Rating(user=4, product=6, rating=0.6767983278562415) Rating(user=6, product=1, rating=0.620394043421327) Rating(user=8, product=8, rating=0.43915435032205224) Rating(user=2, product=4, rating=0.6712931344760976) Rating(user=1, product=2, rating=1.058575470286403) Rating(user=1, product=1, rating=1.0710334376535875) Rating(user=1, product=3, rating=0.7958297361341067) Rating(user=7, product=2, rating=0.6183187594872994) Rating(user=3, product=5, rating=0.862203908436539) Rating(user=5, product=7, rating=0.8487787055836538) By changing this line res = model.predictAll(testRDD).collect() to that res = model.recommendProducts(1, 10) The output is Rating(user=1, product=2, rating=1.0664127057236918) Rating(user=1, product=1, rating=1.054581213757793) Rating(user=1, product=3, rating=0.7844128375421406) Rating(user=1, product=6, rating=0.021054889001335786) Rating(user=1, product=7, rating=0.0190815148087915) Rating(user=1, product=8, rating=0.016932852980070745) Rating(user=1, product=5, rating=0.005659639719215903) Rating(user=1, product=4, rating=-0.007570583694108901) why is that most of these ratings do not show up when using predictAll?