Thanks for the quick feedback Jeff. Re:1 - I did see Zeppelin-3563 but we are not on .8 yet and also we may want to force FAIR execution instead of letting user control it.
Re:2 - Is there an architecture issue here or we just need better thread safety? Ideally scheduler should be able to figure out the dependencies and run whatever can be parallel. Re:Interpreter mode, I may not have been clear but we are running per user scoped mode - so Spark context is shared among all users. Doesn't that mean all jobs from different users go to one FIFOScheduler forcing all small jobs to block on a big one? That is specifically we are trying to avoid. Thanks Ankit On Tue, Jul 24, 2018 at 5:40 PM, Jeff Zhang <zjf...@gmail.com> wrote: > Regarding 1. ZEPPELIN-3563 should be helpful. See > https://github.com/apache/zeppelin/blob/master/docs/ > interpreter/spark.md#running-spark-sql-concurrently > for more details. > https://issues.apache.org/jira/browse/ZEPPELIN-3563 > > Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may > hit weird issues if your paragraph has dependency between each other. e.g. > paragraph 1 will use variable v1 which is defined in paragraph p2. Then the > order of paragraph execution matters here, and ParallelScheduler can > not guarantee the order of execution. > That's why we use FIFOScheduler for SparkInterpreter. > > In your scenario where multiple users share the same sparkcontext, I would > suggest you to use scoped per user mode. Then each user will share the same > sparkcontext which means you can save resources, and also they are in each > FIFOScheduler which is isolated from each other. > > Ankit Jain <ankitjain....@gmail.com>于2018年7月25日周三 上午8:14写道: > >> Forgot to mention this is for shared scoped mode, so same Spark >> application and context for all users on a single Zeppelin instance. >> >> Thanks >> Ankit >> >> On Jul 24, 2018, at 4:12 PM, Ankit Jain <ankitjain....@gmail.com> wrote: >> >> Hi, >> I am playing around with execution policy of Spark jobs(and all Zeppelin >> paragraphs actually). >> >> Looks like there are couple of control points- >> 1) Spark scheduling - FIFO vs Fair as documented in >> https://spark.apache.org/docs/2.1.1/job-scheduling. >> html#fair-scheduler-pools. >> >> Since we are still on .7 version and don't have https://issues.apache. >> org/jira/browse/ZEPPELIN-3563, I am forcefully doing sc.setLocalProperty( >> "spark.scheduler.pool", "fair"); >> in both SparkInterpreter.java and SparkSqlInterpreter.java. >> >> Also because we are exposing Zeppelin to multiple users we may not >> actually want users to hog the cluster and always use FAIR. >> >> This may complicate our merge to .8 though. >> >> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to >> have a scheduler queue. Each task is submitted to a FIFOScheduler except >> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag >> is turned on. >> >> I am changing SparkInterpreter.java to use ParallelScheduler too and >> that seems to do the trick. >> >> Now multiple notebooks are able to run in parallel. >> >> My question is if other people have tested SparkInterpreter with >> ParallelScheduler? >> Also ideally this should be configurable. User should be specify fifo or >> parallel. >> >> Executing all paragraphs does add more complication and maybe >> >> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep >> the execution order sane. >> >> >> Thoughts? >> >> -- >> Thanks & Regards, >> Ankit. >> >> -- Thanks & Regards, Ankit.