Github user debasish83 commented on the pull request:
https://github.com/apache/spark/pull/3098#issuecomment-61896998
@srowen RMSE for both user and product view are similar and so I did not
call multiclass metric..also as per discussion before with @mengxr we are still
not sure to consider recommendation as a multiclass classification.
I refactored the code to compute both user and product MAP separately. I
still use the old way to compute RMSE
Here are the results on MovieLens:
./bin/spark-submit --master
spark://tusca09lmlvt00c.uswin.ad.vzwcorp.com:7077 --jars
/Users/v606014/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--total-executor-cores 4 --executor-memory 4g --driver-memory 1g --class
org.apache.spark.examples.mllib.MovieLensALS
./examples/target/spark-examples_2.10-1.2.0-SNAPSHOT.jar --kryo --lambda 0.065
hdfs://localhost:8020/sandbox/movielens/
2014-11-05 14:42:31.287 java[11464:1903] Unable to load realm mapping info
from SCDynamicStore
14/11/05 14:42:31 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 799710, test: 200499.
Test RMSE = 0.8974937349724504 user MAP = 7.432139222366133 product MAP =
12.203619639224904.
Should I generate the testSet based on more specific random sampling ?
Basically right now I am sampling the whole RDD for 20% test but I can also do
user / product based sampling...did that help you before in Oryx ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]