Hi all,

If you want to launch Spark job from Java in programmatic way, then you
need to Use SparkLauncher.

SparkLauncher uses ProcessBuilder for creating new process - Java seems
handle process creation in an inefficient way.

"
When you execute a process, you must first fork() and then exec(). Forking
creates a child process by duplicating the current process. Then, you call
exec() to change the “process image” to a new “process image”, essentially
executing different code within the child process.
...
When we want to fork a new process, we have to copy the ENTIRE Java JVM…
What we really are doing is requesting the same amount of memory the JVM
been allocated.
"
Source: http://bryanmarty.com/2012/01/14/forking-jvm/
This link <http://bryanmarty.com/2012/01/14/forking-jvm/> shows different
solutions for launching new processes in Java.

If our main program JVM already uses big amount of memory (let's say 6GB),
then for creating new process while using SparkLauncher, we need 12 GB
(virtual) memory available, even though we will not use it.

It will be very helpful if someone could share his/her experience for
handing this memory inefficiency in creating new processes in Java.

Reply via email to