More particular example:

I run pi.py Spark Python example in *yarn-cluster* mode (--master) through
SparkLauncher in Java.

While the program is running, these are the stats of how much memory each
process takes:

SparkSubmit process : 11.266 *gigabyte* Virtual Memory

ApplicationMaster process: 2303480 *byte *Virtual Memory

Why does SparkSubmit process takes so much virtual memory in yarn-cluster
mode ? (which usually causes your Yarn container to be killed because of
outofmemory exception)

On Tue, Jul 14, 2015 at 9:39 AM, Elkhan Dadashov <elkhan8...@gmail.com>
wrote:

> Hi all,
>
> If you want to launch Spark job from Java in programmatic way, then you
> need to Use SparkLauncher.
>
> SparkLauncher uses ProcessBuilder for creating new process - Java seems
> handle process creation in an inefficient way.
>
> "
> When you execute a process, you must first fork() and then exec(). Forking
> creates a child process by duplicating the current process. Then, you call
> exec() to change the “process image” to a new “process image”, essentially
> executing different code within the child process.
> ...
> When we want to fork a new process, we have to copy the ENTIRE Java JVM…
> What we really are doing is requesting the same amount of memory the JVM
> been allocated.
> "
> Source: http://bryanmarty.com/2012/01/14/forking-jvm/
> This link <http://bryanmarty.com/2012/01/14/forking-jvm/> shows different
> solutions for launching new processes in Java.
>
> If our main program JVM already uses big amount of memory (let's say 6GB),
> then for creating new process while using SparkLauncher, we need 12 GB
> (virtual) memory available, even though we will not use it.
>
> It will be very helpful if someone could share his/her experience for
> handing this memory inefficiency in creating new processes in Java.
>
>


-- 

Best regards,
Elkhan Dadashov

Reply via email to