Re: Parallel Execution of Spark Jobs

Ankit Jain Tue, 24 Jul 2018 17:15:02 -0700

Forgot to mention this is for shared scoped mode, so same Spark application and 
context for all users on a single Zeppelin instance.


Thanks
Ankit

> On Jul 24, 2018, at 4:12 PM, Ankit Jain <ankitjain....@gmail.com> wrote:
> 
> Hi,
> I am playing around with execution policy of Spark jobs(and all Zeppelin 
> paragraphs actually).
> 
> Looks like there are couple of control points-
> 1) Spark scheduling - FIFO vs Fair as documented in 
> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools.
> 
> Since we are still on .7 version and don't have 
> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully doing 
> sc.setLocalProperty("spark.scheduler.pool", "fair");
> in both SparkInterpreter.java and SparkSqlInterpreter.java.
> 
> Also because we are exposing Zeppelin to multiple users we may not actually 
> want users to hog the cluster and always use FAIR.
> 
> This may complicate our merge to .8 though.
> 
> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to have 
> a scheduler queue. Each task is submitted to a FIFOScheduler except 
> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag 
> is turned on.
> 
> I am changing SparkInterpreter.java to use ParallelScheduler too and that 
> seems to do the trick.
> 
> Now multiple notebooks are able to run in parallel.
> 
> My question is if other people have tested SparkInterpreter with 
> ParallelScheduler? Also ideally this should be configurable. User should be 
> specify fifo or parallel.
> 
> Executing all paragraphs does add more complication and maybe
> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the 
> execution order sane.
> 
> Thoughts?
> 
> -- 
> Thanks & Regards,
> Ankit.

Re: Parallel Execution of Spark Jobs

Reply via email to