Thanks for the update! -Xiangrui
On Sun, Sep 14, 2014 at 11:33 PM, jatinpreet wrote:
> Hi,
>
> I have been able to get the same accuracy with MLlib as Mahout's. The
> pre-processing phase of Mahout was the reason behind the accuracy mismatch.
> After studying and applying the same logic in my co
Hi,
I have been able to get the same accuracy with MLlib as Mahout's. The
pre-processing phase of Mahout was the reason behind the accuracy mismatch.
After studying and applying the same logic in my code, it worked like a
charm.
Thanks,
Jatin
-
Novice Big Data Programmer
--
View this mess
I have also ran some tests on the other algorithms available with MLlib but
got dismal accuracy. Is the method of creating LabeledPoint RDD different
for other algorithms such as, LinearRegressionWithSGD?
Any help is appreciated.
-
Novice Big Data Programmer
--
View this message in context:
Thanks for the information Xiangrui. I am using the following example to
classify documents.
http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
I am not sure if this is the best way to convert textual data into vectors.
Can you please confirm
If you are using the Mahout's Multinomial Naive Bayes, it should be
the same as MLlib's. I tried MLlib with news20.scale downloaded from
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html
and the test accuracy is 82.4%. -Xiangrui
On Tue, Sep 9, 2014 at 4:58 AM, jatinpreet wrot
Hi,
I tried running the classification program on the famous newsgroup data.
This had an even more drastic effect on the accuracy, as it dropped from
~82% in Mahout to ~72% in Spark MLlib.
Please help me in this regard as I have to use Spark in a production system
very soon and this is a blocker
Hi,
I tried running the classification program on the famous newsgroup data.
This had an even more drastic effect on the accuracy, as it dropped from
~82% in Mahout to ~72% in Spark MLlib.
Please help me in this regard as I have to use Spark in a production system
very soon and this is a blocker