I use the spark-submit script and the config files in a conf directory. I see the memory settings reflected in the stdout, as well as in the webUI. (it prints all variables from spark-default.conf, and metions I have 540GB free memory available when trying to store a broadcast variable or RDD). I also run "ps -aux | grep java | grep th", which show me that I called java with "-Xms1000g -Xmx1000g"
I also tested if these numbers are realistic for the J9 JVM. Outside of Spark, when setting just the initial heapsize (Xms), it gives an error, but if I also define the maximum option with it (Xmx), it seems to us that it is accepting it. Also, in IBM's J9 health center, I see it reserve the 900g, and use up to 68g. Thanks, Tom On 13 March 2015 at 02:05, Reynold Xin <r...@databricks.com> wrote: > How did you run the Spark command? Maybe the memory setting didn't > actually apply? How much memory does the web ui say is available? > > BTW - I don't think any JVM can actually handle 700G heap ... (maybe Zing). > > On Thu, Mar 12, 2015 at 4:09 PM, Tom Hubregtsen <thubregt...@gmail.com> > wrote: > >> Hi all, >> >> I'm running the teraSort benchmark with a relative small input set: 5GB. >> During profiling, I can see I am using a total of 68GB. I've got a >> terabyte >> of memory in my system, and set >> spark.executor.memory 900g >> spark.driver.memory 900g >> I use the default for >> spark.shuffle.memoryFraction >> spark.storage.memoryFraction >> I believe that I now have 0.2*900=180GB for shuffle and 0.6*900=540GB for >> storage. >> >> I noticed a lot of variation in runtime (under the same load), and tracked >> this down to this function in >> core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala >> private def spillToPartitionFiles(collection: >> SizeTrackingPairCollection[(Int, K), C]): Unit = { >> spillToPartitionFiles(collection.iterator) >> } >> In a slow run, it would loop through this function 12000 times, in a fast >> run only 700 times, even though the settings in both runs are the same and >> there are no other users on the system. When I look at the function >> calling >> this (insertAll, also in ExternalSorter), I see that spillToPartitionFiles >> is only called 700 times in both fast and slow runs, meaning that the >> function recursively calls itself very often. Because of the function >> name, >> I assume the system is spilling to disk. As I have sufficient memory, I >> assume that I forgot to set a certain memory setting. Anybody any idea >> which >> other setting I have to set, in order to not spill data in this scenario? >> >> Thanks, >> >> Tom >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/Spilling-when-not-expected-tp11017.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >