Hi all, If you want to launch Spark job from Java in programmatic way, then you need to Use SparkLauncher.
SparkLauncher uses ProcessBuilder for creating new process - Java seems handle process creation in an inefficient way. " When you execute a process, you must first fork() and then exec(). Forking creates a child process by duplicating the current process. Then, you call exec() to change the “process image” to a new “process image”, essentially executing different code within the child process. ... When we want to fork a new process, we have to copy the ENTIRE Java JVM… What we really are doing is requesting the same amount of memory the JVM been allocated. " Source: http://bryanmarty.com/2012/01/14/forking-jvm/ This link <http://bryanmarty.com/2012/01/14/forking-jvm/> shows different solutions for launching new processes in Java. If our main program JVM already uses big amount of memory (let's say 6GB), then for creating new process while using SparkLauncher, we need 12 GB (virtual) memory available, even though we will not use it. It will be very helpful if someone could share his/her experience for handing this memory inefficiency in creating new processes in Java.