Hi,I have been facing an unusual issue with Naive Baye's training. I run out
of heap space with even with limited data during training phase. I am trying
to run the same on a rudimentary cluster of two development machines in
standalone mode.I am reading data from an HBase table, converting them into
TFIDF vectors and then feeding the vectors to Naive Baye's training API. The
out of memory exception occurs while training.The strange part is that I am
able to training with much more data when I read the documents from my local
disk on the driver system.To give an idea, following are my configuration
settings and feature size,Machines on cluster: 2Cores:12(8+4)Total executor
memory: 13GB(6.5+6.5)Executor Memory: 6GBDriver Memory: 4GBFeature Size:
43839Categories/Labels:20Parallelism:100spark.storage.memoryFraction:
0.0spark.shuffle.memoryFraction:        0.8Total text data size on disk: 13.2
MBPlease help me in solving the issue. I am able to run training on same
systems with Mahout and that is unnerving for me. I can't use Hashing TF
available with Spark due to the resultant decrease in accuracy, but with
this feature size, I expect Spark to run easily.Thanks,Jatin



-----
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-memory-exception-in-MLlib-s-naive-baye-s-classification-training-tp14809.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to