Hi,

I had been using Mahout's Naive Bayes algorithm to classify document data.
For a specific train and test set, I was getting accuracy in the range of
86%. When I shifted to Spark's MLlib, the accuracy dropped to the vicinity
of 82%.

I am using same version of Lucene and logic to generate TFIDF vectors. I
tried fiddling with the smoothing parameter but to no avail. 

My question is if the underlying algorithm is same in both Mahout and MLlib,
why this accuracy dip is being observed?



-----
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Accuracy-hit-in-classification-with-Spark-tp13773.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to