+1 for serial run by default. Sent from my iPhone
> On Oct 5, 2017, at 3:36 PM, moon soo Lee <m...@apache.org> wrote: > > I'd like to we also consider simplicity of use. > > We can have two different modes, or two different run buttons for Serial or > Parallel run. This gives flexibility of choosing two different scheduler as a > benefit, but to make user understand difference between two run button, there > must be really good UI treatment. > > I see there're high user demands for run notebook sequentially. And i think > there're 3 action items in this discussion threads. > > 1. Change Parallel -> Serial the current run all button behavior > 2. Provide both Parallel and Serial run buttons with really good UI treatment. > 3. Provides DAG > > I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in > the future. > > So, why don't we try 1) first and keep discuss and polish idea about 2) and > 3)? > > > Thanks, > moon > >> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <msegel_had...@hotmail.com> >> wrote: >> Whoa! >> Seems I walked in to something. >> >> Herval, >> >> What do you suggest? A simple switch that runs everything in serial, or >> everything in parallel? >> That would be a very bad idea. >> >> I gave you an example of a class of solutions where you don’t want that >> behavior. >> E.g Unit testing where you have one setup and then run several unit tests in >> parallel. >> >> If that’s not enough for you… how about if you want to test >> producer/consumer problems? >> >> Or if you want to define classes in one paragraph but then call on them in >> later paragraphs. If everything runs in parallel from the start of time 0, >> you can’t do this. >> >> >> So, if you want to do it right the first time… you need to establish a way >> to control the dependency of paragraphs. This isn’t rocket science. >> And frankly not that complex. >> >> BTW, this is the user list not the dev list… >> >> Just saying… ;-) >> >> >>> On Oct 2, 2017, at 11:24 AM, Herval Freire <hfre...@twitter.com> wrote: >>> >>> "nice to have" isn't a very strong requirement. I strongly uggest you >>> really, really think about this before you start pounding an overengineered >>> solution to a non-issue :-) >>> >>> h >>> >>>> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <msegel_had...@hotmail.com> >>>> wrote: >>>> Yes… >>>> You have bunch of unit tests you can run in parallel where you only need >>>> one constructor and one cleanup. >>>> >>>> I would strongly suggest that you really, really think about this long and >>>> hard before you start to pound code. >>>> Its going to be harder to back out and fix than if you take the time to >>>> think thru the problem and not make a dumb mistake. >>>> >>>>> On Oct 2, 2017, at 11:02 AM, Herval Freire <hfre...@twitter.com> wrote: >>>>> >>>>> Did anyone request such a case ("running some in parallel and some in >>>>> sequence")? I haven't seen any requests for this in the wild (nor on this >>>>> thread), other than theoretical "what if" - which is totally fine, when >>>>> it doesn't introduce a lot of unecessary complexity for little to no gain >>>>> (which seems to be the case here) >>>>> >>>>> h >>>>> >>>>>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel >>>>>> <msegel_had...@hotmail.com> wrote: >>>>>> Because that simplicity doesn’t work. >>>>>> >>>>>> You will want to run some things serial and some things in parallel. >>>>>> >>>>>> Which is why you will need a dependency graph. >>>>>> >>>>>>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hfre...@twitter.com> wrote: >>>>>>> >>>>>>> Why do you need rules and graphs and any of that to support running >>>>>>> everything sequentially or everything in parallel? >>>>>>> >>>>>>> 3) add a “run mode” to the note. If it’s “sequential”, run the >>>>>>> paragraphs one at a time, in the order they’re defined. If parallel, >>>>>>> run using current scheme (as many at the same time as the threadpool >>>>>>> permits) >>>>>>> >>>>>>> Simpler and covers all cases, imo >>>>>>> >>>>>>> >>>>>>> From: Polyakov Valeriy <v.polja...@tinkoff.ru> >>>>>>> Sent: Monday, October 2, 2017 8:24:35 AM >>>>>>> To: users@zeppelin.apache.org >>>>>>> Subject: RE: Implementing run all paragraphs sequentially >>>>>>> >>>>>>> Let me try to summarize the discussion. Evidently, current behavior of >>>>>>> running notes does not meet actual requirements. The most important >>>>>>> thing that we need is the ability of sequential running. However, at >>>>>>> the same time we want to keep functionality of parallel running. We >>>>>>> discussed that the most suitable solution of building paragraphs` >>>>>>> dependencies is a DAG (directed acyclic graph). Therefore, surely, this >>>>>>> kind of dependencies should be defined in note and the running order >>>>>>> should not depend on how we launch it (button / scheduler / API). In >>>>>>> this way, our objectives are to implement “dependency definition >>>>>>> engine” and to use it in “run engine”. What are the options? >>>>>>> 1) Explicit dependency definition. >>>>>>> We could take for a rule that each paragraph should wait for the end of >>>>>>> execution of ALL previous paragraphs. Then we add paragraph option >>>>>>> “Wait for …” where we can choose paragraph for which we are waiting for >>>>>>> to start execution. In case where the option is set, we start execution >>>>>>> immediately after the end of execution of selected paragraph. This >>>>>>> pattern allows us to implement full-parallel DAG running order. What >>>>>>> are the disadvantages? All of them are about the same – not easy >>>>>>> understanding of the dependency management process from the perspective >>>>>>> of users (and probably redundancy of the functionality – my personal >>>>>>> view). At first, we should use strange format of paragraph IDs, which >>>>>>> in addition is hidden. We could come up with visible and handsome >>>>>>> paragraph ID aliases, but then it appears necessity of duplication >>>>>>> control. The second thing is in some kind of scenarios where we should >>>>>>> change existing dependencies (e.g. you need to add new paragraph >>>>>>> between one and dependent group – you have to change option “Wait for >>>>>>> …” for each paragraph in group). >>>>>>> 2) Implicit dependency definition. >>>>>>> We could take for a rule that each paragraph should wait for the end of >>>>>>> execution of ALL previous paragraphs. Then we add paragraph option “Run >>>>>>> in parallel with previous” which allows us to create paragraph groups >>>>>>> to run in parallel. It turns out that we have the way of sequential >>>>>>> running of paragraph groups – group by group in which paragraphs run in >>>>>>> parallel. This approach is much more understandable for the users, but >>>>>>> the obvious defect in comparison with “Explicit definition” is the fact >>>>>>> that dependency graph and level of parallelism are not so cool. >>>>>>> >>>>>>> I am not sure which option (1) or (2) is correct to implement at the >>>>>>> moment. I hope to hear from product visionaries which way to choose and >>>>>>> to get approval for the start of implementation. >>>>>>> Thank you! >>>>>>> >>>>>>> >>>>>>> >>>>>>> Valeriy Polyakov >>>>>>> >>>>>>> >>>>>>> From: Michael Segel [mailto:msegel_had...@hotmail.com] >>>>>>> Sent: Saturday, September 30, 2017 4:22 PM >>>>>>> To: users@zeppelin.apache.org >>>>>>> Subject: Re: Implementing run all paragraphs sequentially >>>>>>> >>>>>>> Sorry to jump in… >>>>>>> >>>>>>> If you want to run paragraphs in parallel, you are going to want to >>>>>>> have some sort of dependency graph. Think of a common set up where you >>>>>>> need to set up common functions and imports. (setup of %spark.dep) >>>>>>> >>>>>>> A good example is if your notebook is a bunch of unit tests and you >>>>>>> need to build the common tear down / set up methods to be used by the >>>>>>> other paragraphs. >>>>>>> >>>>>>> If you’re going to do that, you’ll need to build out a metadata >>>>>>> structure where you can set up your dependencies as well as add things >>>>>>> like labels beyond the ids (which only need to be unique to the given >>>>>>> notebook. ) >>>>>>> >>>>>>> Just my $0.02 >>>>>>> >>>>>>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <m...@apache.org> wrote: >>>>>>> >>>>>>> Current behavior is as parallel as possible. >>>>>>> Run notebook button currently submits all paragraphs in a notebook into >>>>>>> each interpreter's own scheduler (FIFO, Parallel) at once. And each >>>>>>> individual scheduler of interpreter runs the paragraphs. >>>>>>> >>>>>>> I think we can provide "sequential" run button for easier use, which >>>>>>> submits paragraph one and waits for finish before submit next >>>>>>> paragraphs. >>>>>>> >>>>>>> And I think sequential run button doesn't stop having more complex / >>>>>>> flexible DAG in the future? >>>>>>> >>>>>>> Thanks, >>>>>>> moon >>>>>>> >>>>>>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mohitja...@gmail.com> >>>>>>> wrote: >>>>>>> What is the current behavior? >>>>>>> >>>>>>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hfre...@twitter.com> >>>>>>> wrote: >>>>>>> At least in our case, the notebooks that we need to run sequentially >>>>>>> are expected to *always* run sequentially - thus it makes more sense to >>>>>>> be a note option than a per-run mode >>>>>>> >>>>>>> H >>>>>>> >>>>>>> _____________________________ >>>>>>> From: moon soo Lee <m...@apache.org> >>>>>>> Sent: Thursday, September 28, 2017 9:03 PM >>>>>>> Subject: Re: Implementing run all paragraphs sequentially >>>>>>> To: <users@zeppelin.apache.org> >>>>>>> >>>>>>> >>>>>>> This is going to be really useful! >>>>>>> >>>>>>> Curios why do you prefer 'note option' instead of 'run option'? >>>>>>> Could you compare their pros and cons? >>>>>>> >>>>>>> Thanks, >>>>>>> moon >>>>>>> >>>>>>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hfre...@twitter.com> >>>>>>> wrote: >>>>>>> +1, our internal users at Twitter also often request this >>>>>>> >>>>>>> From: Belousov Maksim Eduardovich <m.belou...@tinkoff.ru> >>>>>>> Sent: Thursday, September 28, 2017 8:28:58 AM >>>>>>> To: users@zeppelin.apache.org >>>>>>> Subject: Implementing run all paragraphs sequentially >>>>>>> >>>>>>> Hello, users! >>>>>>> >>>>>>> At the moment our analysts often use mixes of interpreters in their >>>>>>> notes. >>>>>>> For example, they prepare data using %jdbc and then use it in %pyspark. >>>>>>> Besides, they often use scheduling to make some regular reporting. And >>>>>>> they should do something like `time.sleep()` to wait for the data from >>>>>>> %jdbc. It doesn`t guarantee the result and doesn`t look cool. >>>>>>> >>>>>>> You can find early attempts to implement sequential running of all >>>>>>> paragraphs in [1]. >>>>>>> We are really interested in implementation of the issue [2] and are >>>>>>> ready to solve it. >>>>>>> >>>>>>> It seems a good idea to discuss any requirements. >>>>>>> My idea is to introduce note setting that defines the type of running >>>>>>> to use (parallel or sequential) and leave "Run all" to be the only >>>>>>> button running all the cells in the note. This will make sequential or >>>>>>> parallel running the `note option` but not `run option`. >>>>>>> Option will be controlled by nearby button as shown >>>>>>> >>>>>>> <~WRD000.jpg> >>>>>>> >>>>>>> >>>>>>> >>>>>>> For new notes the default state would be "Run sequential all", for old >>>>>>> - "Run parallel for interpreters" >>>>>>> >>>>>>> We are glad to hear any thoughts. >>>>>>> Thank you. >>>>>>> >>>>>>> >>>>>>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 >>>>>>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Maksim Belousov >>>>>>> >>>>>> >>>>> >>>> >>> >>