I'm trying to use Spark to process some data using some native function's
I've integrated using JNI and I pass around a lot of memory I've allocated
inside these functions.  I'm not very familiar with the JVM, so I have a
couple of questions.

(1) Performance seemed terrible until I LD_PRELOAD'ed libtcmalloc.  Will
this break any JVM functionality?
(2) Spark workers seem to OOM pretty readily.  How does Spark choose when to
write back it's results (in my case s3:// via saveAsObjectFile)?  I'm
guessing that I can't keep the JVM heap size set to the system memory since
I need to save space for the native allocations, but a heap size too small
doesn't seem to work.  Is there a way I can get it to write back earlier
than usual so that I have more memory to spare?  I tried to use repartition,
but that generates a shuffle.  In Hadoop I could just turn the number of
mappers up and it would compute the splits accordingly.  I don't see why a
shuffle has to be involved.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/JVM-heap-and-native-allocation-questions-tp12453.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to