"nice to have" isn't a very strong requirement. I strongly uggest you really, really think about this before you start pounding an overengineered solution to a non-issue :-)
h On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <msegel_had...@hotmail.com> wrote: > Yes… > You have bunch of unit tests you can run in parallel where you only need > one constructor and one cleanup. > > I would strongly suggest that you really, really think about this long and > hard before you start to pound code. > Its going to be harder to back out and fix than if you take the time to > think thru the problem and not make a dumb mistake. > > On Oct 2, 2017, at 11:02 AM, Herval Freire <hfre...@twitter.com> wrote: > > Did anyone request such a case ("running some in parallel and some in > sequence")? I haven't seen any requests for this in the wild (nor on this > thread), other than theoretical "what if" - which is totally fine, when it > doesn't introduce a lot of unecessary complexity for little to no gain > (which seems to be the case here) > > h > > On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <msegel_had...@hotmail.com> > wrote: > >> Because that simplicity doesn’t work. >> >> You will want to run some things serial and some things in parallel. >> >> Which is why you will need a dependency graph. >> >> On Oct 2, 2017, at 10:40 AM, Herval Freire <hfre...@twitter.com> wrote: >> >> Why do you need rules and graphs and any of that to support running >> everything sequentially or everything in parallel? >> >> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs >> one at a time, in the order they’re defined. If parallel, run using current >> scheme (as many at the same time as the threadpool permits) >> >> Simpler and covers all cases, imo >> >> ------------------------------ >> *From:* Polyakov Valeriy <v.polja...@tinkoff.ru> >> *Sent:* Monday, October 2, 2017 8:24:35 AM >> *To:* users@zeppelin.apache.org >> *Subject:* RE: Implementing run all paragraphs sequentially >> >> Let me try to summarize the discussion. Evidently, current behavior of >> running notes does not meet actual requirements. The most important thing >> that we need is the ability of sequential running. However, at the same >> time we want to keep functionality of parallel running. We discussed that >> the most suitable solution of building paragraphs` dependencies is a DAG >> (directed acyclic graph). Therefore, surely, this kind of dependencies >> should be defined in note and the running order should not depend on how we >> launch it (button / scheduler / API). In this way, our objectives are to >> implement “dependency definition engine” and to use it in “run engine”. >> What are the options? >> 1) Explicit dependency definition. >> We could take for a rule that each paragraph should wait for the end of >> execution of ALL previous paragraphs. Then we add paragraph option “Wait >> for …” where we can choose paragraph for which we are waiting for to start >> execution. In case where the option is set, we start execution immediately >> after the end of execution of selected paragraph. This pattern allows us to >> implement full-parallel DAG running order. What are the disadvantages? All >> of them are about the same – not easy understanding of the dependency >> management process from the perspective of users (and probably redundancy >> of the functionality – my personal view). At first, we should use strange >> format of paragraph IDs, which in addition is hidden. We could come up with >> visible and handsome paragraph ID aliases, but then it appears necessity of >> duplication control. The second thing is in some kind of scenarios where we >> should change existing dependencies (e.g. you need to add new paragraph >> between one and dependent group – you have to change option “Wait for …” >> for each paragraph in group). >> 2) Implicit dependency definition. >> >> We could take for a rule that each paragraph should wait for the end of >> execution of ALL previous paragraphs. Then we add paragraph option “Run in >> parallel with previous” which allows us to create paragraph groups to run >> in parallel. It turns out that we have the way of sequential running of >> paragraph groups – group by group in which paragraphs run in parallel. This >> approach is much more understandable for the users, but the obvious defect >> in comparison with “Explicit definition” is the fact that dependency graph >> and level of parallelism are not so cool. >> I am not sure which option (1) or (2) is correct to implement at the >> moment. I hope to hear from product visionaries which way to choose and to >> get approval for the start of implementation. >> Thank you! >> >> >> >> >> >> >> >> *Valeriy Polyakov * >> >> >> *From:* Michael Segel [mailto:msegel_had...@hotmail.com >> <msegel_had...@hotmail.com>] >> *Sent:* Saturday, September 30, 2017 4:22 PM >> *To:* users@zeppelin.apache.org >> *Subject:* Re: Implementing run all paragraphs sequentially >> >> >> Sorry to jump in… >> >> >> If you want to run paragraphs in parallel, you are going to want to have >> some sort of dependency graph. Think of a common set up where you need to >> set up common functions and imports. (setup of %spark.dep) >> >> >> A good example is if your notebook is a bunch of unit tests and you need >> to build the common tear down / set up methods to be used by the other >> paragraphs. >> >> >> If you’re going to do that, you’ll need to build out a metadata structure >> where you can set up your dependencies as well as add things like labels >> beyond the ids (which only need to be unique to the given notebook. ) >> >> >> Just my $0.02 >> >> >> >> On Sep 29, 2017, at 1:30 PM, moon soo Lee <m...@apache.org> wrote: >> >> >> Current behavior is as parallel as possible. >> Run notebook button currently submits all paragraphs in a notebook into >> each interpreter's own scheduler (FIFO, Parallel) at once. And each >> individual scheduler of interpreter runs the paragraphs. >> >> >> I think we can provide "sequential" run button for easier use, which >> submits paragraph one and waits for finish before submit next paragraphs. >> >> >> And I think sequential run button doesn't stop having more complex / >> flexible DAG in the future? >> >> >> Thanks, >> moon >> >> >> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mohitja...@gmail.com> >> wrote: >> >> What is the current behavior? >> >> >> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hfre...@twitter.com> >> wrote: >> >> At least in our case, the notebooks that we need to run sequentially are >> expected to *always* run sequentially - thus it makes more sense to be a >> note option than a per-run mode >> >> >> H >> >> >> >> _____________________________ >> From: moon soo Lee <m...@apache.org> >> Sent: Thursday, September 28, 2017 9:03 PM >> Subject: Re: Implementing run all paragraphs sequentially >> To: <users@zeppelin.apache.org> >> >> This is going to be really useful! >> >> >> Curios why do you prefer 'note option' instead of 'run option'? >> Could you compare their pros and cons? >> >> >> Thanks, >> moon >> >> >> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hfre...@twitter.com> >> wrote: >> >> +1, our internal users at Twitter also often request this >> >> >> ------------------------------ >> *From:* Belousov Maksim Eduardovich <m.belou...@tinkoff.ru> >> *Sent:* Thursday, September 28, 2017 8:28:58 AM >> *To:* users@zeppelin.apache.org >> *Subject:* Implementing run all paragraphs sequentially >> >> >> Hello, users! >> >> >> At the moment our analysts often use mixes of interpreters in their notes. >> For example, they prepare data using %jdbc and then use it in %pyspark. >> Besides, they often use scheduling to make some regular reporting. And they >> should do something like `time.sleep()` to wait for the data from %jdbc. It >> doesn`t guarantee the result and doesn`t look cool. >> >> >> You can find early attempts to implement sequential running of all >> paragraphs in [1]. >> We are really interested in implementation of the issue [2] and are ready >> to solve it. >> >> >> It seems a good idea to discuss any requirements. >> My idea is to introduce note setting that defines the type of running to >> use (parallel or sequential) and leave "Run all" to be the only button >> running all the cells in the note. This will make sequential or parallel >> running the `note option` but not `run option`. >> Option will be controlled by nearby button as shown >> >> >> <~WRD000.jpg> >> >> >> >> >> >> >> For new notes the default state would be "Run sequential all", for old - >> "Run parallel for interpreters" >> >> >> We are glad to hear any thoughts. >> Thank you. >> >> >> >> >> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 >> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 >> >> >> >> >> >> >> *Maksim Belousov* >> >> >> > >