Thanks, Marcelo. That article confused me, thanks for correcting it & helpful tips.
I looked into Virtual memory usage (jmap+jvisualvm) does not show that 11.5 g Virtual Memory usage - it is much less. I get 11.5 g Virtual memory usage using top -p pid command for SparkSubmit process. The virtual memory consumed by a process is the total of everything that's in the process memory map. This includes data (eg, the Java heap), but also all of the shared libraries and memory-mapped files used by the program. (source: http://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used ) But if i use *pmap* on SparkSubmit process id, then it shows lots of things attached to that process which adds up in Virtual memory (libgcc_s.so.1, libsunec.so, libnss_files-2.19.so, libmanagement.so, cldrdata.jar, datanucleus-core-3.2.10.jar, spark-assembly-1.5.0-SNAPSHOT-hadoop2.3.0.jar, locale-archive, rt.jar, sunjce_provider.jar, sunec.jar, sunpkcs11.jar, jsse.jar, libzip.so , ..., and lots of huge anon files) That is the main reason why it adds up to 11.5 GB Virtual memory usage. P.S: "SPARK_PRINT_LAUNCH_CMD=1" did not have any effect. On Tue, Jul 14, 2015 at 10:57 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, Jul 14, 2015 at 9:53 AM, Elkhan Dadashov <elkhan8...@gmail.com> > wrote: > >> While the program is running, these are the stats of how much memory each >> process takes: >> >> SparkSubmit process : 11.266 *gigabyte* Virtual Memory >> >> ApplicationMaster process: 2303480 *byte *Virtual Memory >> > > That SparkSubmit number looks very suspicious. In yarn-cluster mode, > SparkSubmit doesn't do much and should not use a lot of memory. You could > set "SPARK_PRINT_LAUNCH_CMD=1" before launching the app to see the exact > java command line being used, and see whether it has any suspicious > configuration. You could also use jmap to dump the heap and look at it with > jvisualvm, and see if there's any low hanging fruit w.r.t. what's using the > memory. > > Regarding the fork / exec comment, that's very misleading. OSes are very > efficient when forking - they'll not copy the entire parent process, > instead they'll do COW on memory pages that change. So if you do an exec > right afterwards, you're basically copying very little memory. > > -- > Marcelo > -- Best regards, Elkhan Dadashov