Hi,I have been facing an unusual issue with Naive Baye's training. I run out of heap space with even with limited data during training phase. I am trying to run the same on a rudimentary cluster of two development machines in standalone mode.I am reading data from an HBase table, converting them into TFIDF vectors and then feeding the vectors to Naive Baye's training API. The out of memory exception occurs while training.The strange part is that I am able to training with much more data when I read the documents from my local disk on the driver system.To give an idea, following are my configuration settings and feature size,Machines on cluster: 2Cores:12(8+4)Total executor memory: 13GB(6.5+6.5)Executor Memory: 6GBDriver Memory: 4GBFeature Size: 43839Categories/Labels:20Parallelism:100spark.storage.memoryFraction: 0.0spark.shuffle.memoryFraction: 0.8Total text data size on disk: 13.2 MBPlease help me in solving the issue. I am able to run training on same systems with Mahout and that is unnerving for me. I can't use Hashing TF available with Spark due to the resultant decrease in accuracy, but with this feature size, I expect Spark to run easily.Thanks,Jatin
----- Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-memory-exception-in-MLlib-s-naive-baye-s-classification-training-tp14809.html Sent from the Apache Spark User List mailing list archive at Nabble.com.