Did anyone request such a case ("running some in parallel and some in sequence")? I haven't seen any requests for this in the wild (nor on this thread), other than theoretical "what if" - which is totally fine, when it doesn't introduce a lot of unecessary complexity for little to no gain (which seems to be the case here)
h On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <msegel_had...@hotmail.com> wrote: > Because that simplicity doesn’t work. > > You will want to run some things serial and some things in parallel. > > Which is why you will need a dependency graph. > > On Oct 2, 2017, at 10:40 AM, Herval Freire <hfre...@twitter.com> wrote: > > Why do you need rules and graphs and any of that to support running > everything sequentially or everything in parallel? > > 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs > one at a time, in the order they’re defined. If parallel, run using current > scheme (as many at the same time as the threadpool permits) > > Simpler and covers all cases, imo > > ------------------------------ > *From:* Polyakov Valeriy <v.polja...@tinkoff.ru> > *Sent:* Monday, October 2, 2017 8:24:35 AM > *To:* users@zeppelin.apache.org > *Subject:* RE: Implementing run all paragraphs sequentially > > Let me try to summarize the discussion. Evidently, current behavior of > running notes does not meet actual requirements. The most important thing > that we need is the ability of sequential running. However, at the same > time we want to keep functionality of parallel running. We discussed that > the most suitable solution of building paragraphs` dependencies is a DAG > (directed acyclic graph). Therefore, surely, this kind of dependencies > should be defined in note and the running order should not depend on how we > launch it (button / scheduler / API). In this way, our objectives are to > implement “dependency definition engine” and to use it in “run engine”. > What are the options? > 1) Explicit dependency definition. > We could take for a rule that each paragraph should wait for the end of > execution of ALL previous paragraphs. Then we add paragraph option “Wait > for …” where we can choose paragraph for which we are waiting for to start > execution. In case where the option is set, we start execution immediately > after the end of execution of selected paragraph. This pattern allows us to > implement full-parallel DAG running order. What are the disadvantages? All > of them are about the same – not easy understanding of the dependency > management process from the perspective of users (and probably redundancy > of the functionality – my personal view). At first, we should use strange > format of paragraph IDs, which in addition is hidden. We could come up with > visible and handsome paragraph ID aliases, but then it appears necessity of > duplication control. The second thing is in some kind of scenarios where we > should change existing dependencies (e.g. you need to add new paragraph > between one and dependent group – you have to change option “Wait for …” > for each paragraph in group). > 2) Implicit dependency definition. > > We could take for a rule that each paragraph should wait for the end of > execution of ALL previous paragraphs. Then we add paragraph option “Run in > parallel with previous” which allows us to create paragraph groups to run > in parallel. It turns out that we have the way of sequential running of > paragraph groups – group by group in which paragraphs run in parallel. This > approach is much more understandable for the users, but the obvious defect > in comparison with “Explicit definition” is the fact that dependency graph > and level of parallelism are not so cool. > I am not sure which option (1) or (2) is correct to implement at the > moment. I hope to hear from product visionaries which way to choose and to > get approval for the start of implementation. > Thank you! > > > > > > > > *Valeriy Polyakov * > > > *From:* Michael Segel [mailto:msegel_had...@hotmail.com > <msegel_had...@hotmail.com>] > *Sent:* Saturday, September 30, 2017 4:22 PM > *To:* users@zeppelin.apache.org > *Subject:* Re: Implementing run all paragraphs sequentially > > > Sorry to jump in… > > > If you want to run paragraphs in parallel, you are going to want to have > some sort of dependency graph. Think of a common set up where you need to > set up common functions and imports. (setup of %spark.dep) > > > A good example is if your notebook is a bunch of unit tests and you need > to build the common tear down / set up methods to be used by the other > paragraphs. > > > If you’re going to do that, you’ll need to build out a metadata structure > where you can set up your dependencies as well as add things like labels > beyond the ids (which only need to be unique to the given notebook. ) > > > Just my $0.02 > > > > On Sep 29, 2017, at 1:30 PM, moon soo Lee <m...@apache.org> wrote: > > > Current behavior is as parallel as possible. > Run notebook button currently submits all paragraphs in a notebook into > each interpreter's own scheduler (FIFO, Parallel) at once. And each > individual scheduler of interpreter runs the paragraphs. > > > I think we can provide "sequential" run button for easier use, which > submits paragraph one and waits for finish before submit next paragraphs. > > > And I think sequential run button doesn't stop having more complex / > flexible DAG in the future? > > > Thanks, > moon > > > On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mohitja...@gmail.com> wrote: > > What is the current behavior? > > > On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hfre...@twitter.com> > wrote: > > At least in our case, the notebooks that we need to run sequentially are > expected to *always* run sequentially - thus it makes more sense to be a > note option than a per-run mode > > > H > > > > _____________________________ > From: moon soo Lee <m...@apache.org> > Sent: Thursday, September 28, 2017 9:03 PM > Subject: Re: Implementing run all paragraphs sequentially > To: <users@zeppelin.apache.org> > > This is going to be really useful! > > > Curios why do you prefer 'note option' instead of 'run option'? > Could you compare their pros and cons? > > > Thanks, > moon > > > On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hfre...@twitter.com> wrote: > > +1, our internal users at Twitter also often request this > > > ------------------------------ > *From:* Belousov Maksim Eduardovich <m.belou...@tinkoff.ru> > *Sent:* Thursday, September 28, 2017 8:28:58 AM > *To:* users@zeppelin.apache.org > *Subject:* Implementing run all paragraphs sequentially > > > Hello, users! > > > At the moment our analysts often use mixes of interpreters in their notes. > For example, they prepare data using %jdbc and then use it in %pyspark. > Besides, they often use scheduling to make some regular reporting. And they > should do something like `time.sleep()` to wait for the data from %jdbc. It > doesn`t guarantee the result and doesn`t look cool. > > > You can find early attempts to implement sequential running of all > paragraphs in [1]. > We are really interested in implementation of the issue [2] and are ready > to solve it. > > > It seems a good idea to discuss any requirements. > My idea is to introduce note setting that defines the type of running to > use (parallel or sequential) and leave "Run all" to be the only button > running all the cells in the note. This will make sequential or parallel > running the `note option` but not `run option`. > Option will be controlled by nearby button as shown > > > <~WRD000.jpg> > > > > > > > For new notes the default state would be "Run sequential all", for old - > "Run parallel for interpreters" > > > We are glad to hear any thoughts. > Thank you. > > > > > [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 > [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 > > > > > > > *Maksim Belousov* > > >