I'm trying to use Spark to process some data using some native function's I've integrated using JNI and I pass around a lot of memory I've allocated inside these functions. I'm not very familiar with the JVM, so I have a couple of questions.
(1) Performance seemed terrible until I LD_PRELOAD'ed libtcmalloc. Will this break any JVM functionality? (2) Spark workers seem to OOM pretty readily. How does Spark choose when to write back it's results (in my case s3:// via saveAsObjectFile)? I'm guessing that I can't keep the JVM heap size set to the system memory since I need to save space for the native allocations, but a heap size too small doesn't seem to work. Is there a way I can get it to write back earlier than usual so that I have more memory to spare? I tried to use repartition, but that generates a shuffle. In Hadoop I could just turn the number of mappers up and it would compute the splits accordingly. I don't see why a shuffle has to be involved. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JVM-heap-and-native-allocation-questions-tp12453.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org