Thanks, Marcelo.

That article confused me, thanks for correcting it & helpful tips.

I looked into Virtual memory usage (jmap+jvisualvm) does not show that 11.5
g Virtual Memory usage - it is much less. I get 11.5 g Virtual memory usage
using top -p pid command for SparkSubmit process.

The virtual memory consumed by a process is the total of everything that's
in the process memory map. This includes data (eg, the Java heap), but also
all of the shared libraries and memory-mapped files used by the program.
(source:
http://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used
)

But if i use *pmap* on SparkSubmit process id, then it shows lots of things
attached to that process which adds up in Virtual memory (libgcc_s.so.1,
libsunec.so, libnss_files-2.19.so, libmanagement.so, cldrdata.jar,
datanucleus-core-3.2.10.jar, spark-assembly-1.5.0-SNAPSHOT-hadoop2.3.0.jar,
locale-archive, rt.jar, sunjce_provider.jar,  sunec.jar, sunpkcs11.jar,
jsse.jar, libzip.so , ..., and lots of huge anon files)

That is the main reason why it adds up to 11.5 GB Virtual memory usage.

P.S: "SPARK_PRINT_LAUNCH_CMD=1" did not have any effect.

On Tue, Jul 14, 2015 at 10:57 AM, Marcelo Vanzin <van...@cloudera.com>
wrote:

> On Tue, Jul 14, 2015 at 9:53 AM, Elkhan Dadashov <elkhan8...@gmail.com>
> wrote:
>
>> While the program is running, these are the stats of how much memory each
>> process takes:
>>
>> SparkSubmit process : 11.266 *gigabyte* Virtual Memory
>>
>> ApplicationMaster process: 2303480 *byte *Virtual Memory
>>
>
> That SparkSubmit number looks very suspicious. In yarn-cluster mode,
> SparkSubmit doesn't do much and should not use a lot of memory. You could
> set "SPARK_PRINT_LAUNCH_CMD=1" before launching the app to see the exact
> java command line being used, and see whether it has any suspicious
> configuration. You could also use jmap to dump the heap and look at it with
> jvisualvm, and see if there's any low hanging fruit w.r.t. what's using the
> memory.
>
> Regarding the fork / exec comment, that's very misleading. OSes are very
> efficient when forking - they'll not copy the entire parent process,
> instead they'll do COW on memory pages that change. So if you do an exec
> right afterwards, you're basically copying very little memory.
>
> --
> Marcelo
>



-- 

Best regards,
Elkhan Dadashov

Reply via email to