Hi, I had been using Mahout's Naive Bayes algorithm to classify document data. For a specific train and test set, I was getting accuracy in the range of 86%. When I shifted to Spark's MLlib, the accuracy dropped to the vicinity of 82%.
I am using same version of Lucene and logic to generate TFIDF vectors. I tried fiddling with the smoothing parameter but to no avail. My question is if the underlying algorithm is same in both Mahout and MLlib, why this accuracy dip is being observed? ----- Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Accuracy-hit-in-classification-with-Spark-tp13773.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org