Thanks for further clarification Jeff.
> On Jul 26, 2018, at 8:11 PM, Jeff Zhang <zjf...@gmail.com> wrote: > > Let me rephrase it. In scoped mode, there's multiple Interpreter Group > (Personally I prefer to call it multiple sessions) in ones JVM (For spark > interpreter, there's multiple SparkInterpreter instances). > And there's one SparkContext in this JVM which is shared by all the > SparkInterpreter instances. Regarding Scheduler, there's multiple Scheduler > in scoped mode in this JVM, each SparkInterpreter instance own its own > scheduler. Let me know if you have any other question. > > > > Ankit Jain <ankitjain....@gmail.com>于2018年7月25日周三 下午10:27写道: >> Jeff, what you said seems to be in conflict with what is detailed here - >> https://medium.com/@leemoonsoo/apache-zeppelin-interpreter-mode-explained-bae0525d0555 >> >> "In Scoped mode, Zeppelin still runs single interpreter JVM process but >> multiple Interpreter Group serve each Note." >> >> In practice as well we see one Interpreter process for scoped mode. >> >> Can you please clarify? >> >> Adding Moon too. >> >> Thanks >> Ankit >> >>> On Tue, Jul 24, 2018 at 11:09 PM, Ankit Jain <ankitjain....@gmail.com> >>> wrote: >>> Aah that makes sense - so only all jobs from one user will block in >>> FIFOScheduler. >>> >>> By moving to ParallelScheduler, only gain achieved is jobs from same user >>> can also be run in parallel but may have dependency resolution issues. >>> >>> Just to confirm I have it right - If "Run all" notebook is not a >>> requirement and users run one paragraph at a time from different notebooks, >>> ParallelScheduler should be ok? >>> >>> Thanks >>> Ankit >>> >>>> On Tue, Jul 24, 2018 at 10:38 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>> 1. Zeppelin-3563 force FAIR scheduling and just allow to specify the pool >>>> 2. scheduler can not to figure out the dependencies between paragraphs. >>>> That's why SparkInterpreter use FIFOScheduler. >>>> If you use per user scoped mode. SparkContext is shared between users but >>>> SparkInterpreter is not shared. That means there's multiple >>>> SparkInterpreter instances that share the same SparkContext but they >>>> doesn't share the same FIFOScheduler, each SparkInterpreter use its own >>>> FIFOScheduler. >>>> >>>> Ankit Jain <ankitjain....@gmail.com>于2018年7月25日周三 下午12:58写道: >>>>> Thanks for the quick feedback Jeff. >>>>> >>>>> Re:1 - I did see Zeppelin-3563 but we are not on .8 yet and also we may >>>>> want to force FAIR execution instead of letting user control it. >>>>> >>>>> Re:2 - Is there an architecture issue here or we just need better thread >>>>> safety? Ideally scheduler should be able to figure out the dependencies >>>>> and run whatever can be parallel. >>>>> >>>>> Re:Interpreter mode, I may not have been clear but we are running per >>>>> user scoped mode - so Spark context is shared among all users. >>>>> >>>>> Doesn't that mean all jobs from different users go to one FIFOScheduler >>>>> forcing all small jobs to block on a big one? That is specifically we are >>>>> trying to avoid. >>>>> >>>>> Thanks >>>>> Ankit >>>>> >>>>>> On Tue, Jul 24, 2018 at 5:40 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>>>>> Regarding 1. ZEPPELIN-3563 should be helpful. See >>>>>> https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md#running-spark-sql-concurrently >>>>>> for more details. >>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3563 >>>>>> >>>>>> Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may >>>>>> hit weird issues if your paragraph has dependency between each other. >>>>>> e.g. paragraph 1 will use variable v1 which is defined in paragraph p2. >>>>>> Then the order of paragraph execution matters here, and >>>>>> ParallelScheduler can not guarantee the order of execution. >>>>>> That's why we use FIFOScheduler for SparkInterpreter. >>>>>> >>>>>> In your scenario where multiple users share the same sparkcontext, I >>>>>> would suggest you to use scoped per user mode. Then each user will share >>>>>> the same sparkcontext which means you can save resources, and also they >>>>>> are in each FIFOScheduler which is isolated from each other. >>>>>> >>>>>> Ankit Jain <ankitjain....@gmail.com>于2018年7月25日周三 上午8:14写道: >>>>>>> Forgot to mention this is for shared scoped mode, so same Spark >>>>>>> application and context for all users on a single Zeppelin instance. >>>>>>> >>>>>>> Thanks >>>>>>> Ankit >>>>>>> >>>>>>>> On Jul 24, 2018, at 4:12 PM, Ankit Jain <ankitjain....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> I am playing around with execution policy of Spark jobs(and all >>>>>>>> Zeppelin paragraphs actually). >>>>>>>> >>>>>>>> Looks like there are couple of control points- >>>>>>>> 1) Spark scheduling - FIFO vs Fair as documented in >>>>>>>> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools. >>>>>>>> >>>>>>>> Since we are still on .7 version and don't have >>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully >>>>>>>> doing sc.setLocalProperty("spark.scheduler.pool", "fair"); >>>>>>>> in both SparkInterpreter.java and SparkSqlInterpreter.java. >>>>>>>> >>>>>>>> Also because we are exposing Zeppelin to multiple users we may not >>>>>>>> actually want users to hog the cluster and always use FAIR. >>>>>>>> >>>>>>>> This may complicate our merge to .8 though. >>>>>>>> >>>>>>>> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems >>>>>>>> to have a scheduler queue. Each task is submitted to a FIFOScheduler >>>>>>>> except SparkSqlInterpreter which creates a ParallelScheduler ig >>>>>>>> concurrentsql flag is turned on. >>>>>>>> >>>>>>>> I am changing SparkInterpreter.java to use ParallelScheduler too and >>>>>>>> that seems to do the trick. >>>>>>>> >>>>>>>> Now multiple notebooks are able to run in parallel. >>>>>>>> >>>>>>>> My question is if other people have tested SparkInterpreter with >>>>>>>> ParallelScheduler? Also ideally this should be configurable. User >>>>>>>> should be specify fifo or parallel. >>>>>>>> >>>>>>>> Executing all paragraphs does add more complication and maybe >>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep >>>>>>>> the execution order sane. >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks & Regards, >>>>>>>> Ankit. >>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> Ankit. >>> >>> >>> >>> -- >>> Thanks & Regards, >>> Ankit. >> >> >> >> -- >> Thanks & Regards, >> Ankit.