Re: Accuracy hit in classification with Spark

2014-09-15 Thread Xiangrui Meng
Thanks for the update! -Xiangrui On Sun, Sep 14, 2014 at 11:33 PM, jatinpreet wrote: > Hi, > > I have been able to get the same accuracy with MLlib as Mahout's. The > pre-processing phase of Mahout was the reason behind the accuracy mismatch. > After studying and applying the same logic in my co

Re: Accuracy hit in classification with Spark

2014-09-14 Thread jatinpreet
Hi, I have been able to get the same accuracy with MLlib as Mahout's. The pre-processing phase of Mahout was the reason behind the accuracy mismatch. After studying and applying the same logic in my code, it worked like a charm. Thanks, Jatin - Novice Big Data Programmer -- View this mess

Re: Accuracy hit in classification with Spark

2014-09-09 Thread jatinpreet
I have also ran some tests on the other algorithms available with MLlib but got dismal accuracy. Is the method of creating LabeledPoint RDD different for other algorithms such as, LinearRegressionWithSGD? Any help is appreciated. - Novice Big Data Programmer -- View this message in context:

Re: Accuracy hit in classification with Spark

2014-09-09 Thread jatinpreet
Thanks for the information Xiangrui. I am using the following example to classify documents. http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/ I am not sure if this is the best way to convert textual data into vectors. Can you please confirm

Re: Accuracy hit in classification with Spark

2014-09-09 Thread Xiangrui Meng
If you are using the Mahout's Multinomial Naive Bayes, it should be the same as MLlib's. I tried MLlib with news20.scale downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html and the test accuracy is 82.4%. -Xiangrui On Tue, Sep 9, 2014 at 4:58 AM, jatinpreet wrot

Re: Accuracy hit in classification with Spark

2014-09-09 Thread jatinpreet
Hi, I tried running the classification program on the famous newsgroup data. This had an even more drastic effect on the accuracy, as it dropped from ~82% in Mahout to ~72% in Spark MLlib. Please help me in this regard as I have to use Spark in a production system very soon and this is a blocker

Re: Accuracy hit in classification with Spark

2014-09-09 Thread jatinpreet
Hi, I tried running the classification program on the famous newsgroup data. This had an even more drastic effect on the accuracy, as it dropped from ~82% in Mahout to ~72% in Spark MLlib. Please help me in this regard as I have to use Spark in a production system very soon and this is a blocker