+1 for serial run by default. Let's leave others in future. Mohit Jaggi <mohitja...@gmail.com>于2017年10月6日周五 上午7:48写道:
> +1 for serial run by default. > > Sent from my iPhone > > On Oct 5, 2017, at 3:36 PM, moon soo Lee <m...@apache.org> wrote: > > I'd like to we also consider simplicity of use. > > We can have two different modes, or two different run buttons for Serial > or Parallel run. This gives flexibility of choosing two different scheduler > as a benefit, but to make user understand difference between two run > button, there must be really good UI treatment. > > I see there're high user demands for run notebook sequentially. And i > think there're 3 action items in this discussion threads. > > 1. Change Parallel -> Serial the current run all button behavior > 2. Provide both Parallel and Serial run buttons with really good UI > treatment. > 3. Provides DAG > > I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) > in the future. > > So, why don't we try 1) first and keep discuss and polish idea about 2) > and 3)? > > > Thanks, > moon > > On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <msegel_had...@hotmail.com> > wrote: > >> Whoa! >> Seems I walked in to something. >> >> Herval, >> >> What do you suggest? A simple switch that runs everything in serial, or >> everything in parallel? >> That would be a very bad idea. >> >> I gave you an example of a class of solutions where you don’t want that >> behavior. >> E.g Unit testing where you have one setup and then run several unit tests >> in parallel. >> >> If that’s not enough for you… how about if you want to test >> producer/consumer problems? >> >> Or if you want to define classes in one paragraph but then call on them >> in later paragraphs. If everything runs in parallel from the start of time >> 0, you can’t do this. >> >> >> So, if you want to do it right the first time… you need to establish a >> way to control the dependency of paragraphs. This isn’t rocket science. >> And frankly not that complex. >> >> BTW, this is the user list not the dev list… >> >> Just saying… ;-) >> >> >> On Oct 2, 2017, at 11:24 AM, Herval Freire <hfre...@twitter.com> wrote: >> >> "nice to have" isn't a very strong requirement. I strongly uggest you >> really, really think about this before you start pounding an overengineered >> solution to a non-issue :-) >> >> h >> >> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <msegel_had...@hotmail.com> >> wrote: >> >>> Yes… >>> You have bunch of unit tests you can run in parallel where you only >>> need one constructor and one cleanup. >>> >>> I would strongly suggest that you really, really think about this long >>> and hard before you start to pound code. >>> Its going to be harder to back out and fix than if you take the time to >>> think thru the problem and not make a dumb mistake. >>> >>> On Oct 2, 2017, at 11:02 AM, Herval Freire <hfre...@twitter.com> wrote: >>> >>> Did anyone request such a case ("running some in parallel and some in >>> sequence")? I haven't seen any requests for this in the wild (nor on this >>> thread), other than theoretical "what if" - which is totally fine, when it >>> doesn't introduce a lot of unecessary complexity for little to no gain >>> (which seems to be the case here) >>> >>> h >>> >>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <msegel_had...@hotmail.com >>> > wrote: >>> >>>> Because that simplicity doesn’t work. >>>> >>>> You will want to run some things serial and some things in parallel. >>>> >>>> Which is why you will need a dependency graph. >>>> >>>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hfre...@twitter.com> wrote: >>>> >>>> Why do you need rules and graphs and any of that to support running >>>> everything sequentially or everything in parallel? >>>> >>>> 3) add a “run mode” to the note. If it’s “sequential”, run the >>>> paragraphs one at a time, in the order they’re defined. If parallel, run >>>> using current scheme (as many at the same time as the threadpool permits) >>>> >>>> Simpler and covers all cases, imo >>>> >>>> ------------------------------ >>>> *From:* Polyakov Valeriy <v.polja...@tinkoff.ru> >>>> *Sent:* Monday, October 2, 2017 8:24:35 AM >>>> *To:* users@zeppelin.apache.org >>>> *Subject:* RE: Implementing run all paragraphs sequentially >>>> >>>> Let me try to summarize the discussion. Evidently, current behavior of >>>> running notes does not meet actual requirements. The most important thing >>>> that we need is the ability of sequential running. However, at the same >>>> time we want to keep functionality of parallel running. We discussed that >>>> the most suitable solution of building paragraphs` dependencies is a DAG >>>> (directed acyclic graph). Therefore, surely, this kind of dependencies >>>> should be defined in note and the running order should not depend on how we >>>> launch it (button / scheduler / API). In this way, our objectives are to >>>> implement “dependency definition engine” and to use it in “run engine”. >>>> What are the options? >>>> 1) Explicit dependency definition. >>>> We could take for a rule that each paragraph should wait for the end of >>>> execution of ALL previous paragraphs. Then we add paragraph option “Wait >>>> for …” where we can choose paragraph for which we are waiting for to start >>>> execution. In case where the option is set, we start execution immediately >>>> after the end of execution of selected paragraph. This pattern allows us to >>>> implement full-parallel DAG running order. What are the disadvantages? All >>>> of them are about the same – not easy understanding of the dependency >>>> management process from the perspective of users (and probably redundancy >>>> of the functionality – my personal view). At first, we should use strange >>>> format of paragraph IDs, which in addition is hidden. We could come up with >>>> visible and handsome paragraph ID aliases, but then it appears necessity of >>>> duplication control. The second thing is in some kind of scenarios where we >>>> should change existing dependencies (e.g. you need to add new paragraph >>>> between one and dependent group – you have to change option “Wait for …” >>>> for each paragraph in group). >>>> 2) Implicit dependency definition. >>>> >>>> We could take for a rule that each paragraph should wait for the end of >>>> execution of ALL previous paragraphs. Then we add paragraph option “Run in >>>> parallel with previous” which allows us to create paragraph groups to run >>>> in parallel. It turns out that we have the way of sequential running of >>>> paragraph groups – group by group in which paragraphs run in parallel. This >>>> approach is much more understandable for the users, but the obvious defect >>>> in comparison with “Explicit definition” is the fact that dependency graph >>>> and level of parallelism are not so cool. >>>> I am not sure which option (1) or (2) is correct to implement at the >>>> moment. I hope to hear from product visionaries which way to choose and to >>>> get approval for the start of implementation. >>>> Thank you! >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Valeriy Polyakov * >>>> >>>> >>>> *From:* Michael Segel [mailto:msegel_had...@hotmail.com >>>> <msegel_had...@hotmail.com>] >>>> *Sent:* Saturday, September 30, 2017 4:22 PM >>>> *To:* users@zeppelin.apache.org >>>> *Subject:* Re: Implementing run all paragraphs sequentially >>>> >>>> >>>> Sorry to jump in… >>>> >>>> >>>> If you want to run paragraphs in parallel, you are going to want to >>>> have some sort of dependency graph. Think of a common set up where you >>>> need to set up common functions and imports. (setup of %spark.dep) >>>> >>>> >>>> A good example is if your notebook is a bunch of unit tests and you >>>> need to build the common tear down / set up methods to be used by the other >>>> paragraphs. >>>> >>>> >>>> If you’re going to do that, you’ll need to build out a metadata >>>> structure where you can set up your dependencies as well as add things >>>> like labels beyond the ids (which only need to be unique to the given >>>> notebook. ) >>>> >>>> >>>> Just my $0.02 >>>> >>>> >>>> >>>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <m...@apache.org> wrote: >>>> >>>> >>>> Current behavior is as parallel as possible. >>>> Run notebook button currently submits all paragraphs in a notebook into >>>> each interpreter's own scheduler (FIFO, Parallel) at once. And each >>>> individual scheduler of interpreter runs the paragraphs. >>>> >>>> >>>> I think we can provide "sequential" run button for easier use, which >>>> submits paragraph one and waits for finish before submit next paragraphs. >>>> >>>> >>>> And I think sequential run button doesn't stop having more complex / >>>> flexible DAG in the future? >>>> >>>> >>>> Thanks, >>>> moon >>>> >>>> >>>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mohitja...@gmail.com> >>>> wrote: >>>> >>>> What is the current behavior? >>>> >>>> >>>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hfre...@twitter.com> >>>> wrote: >>>> >>>> At least in our case, the notebooks that we need to run sequentially >>>> are expected to *always* run sequentially - thus it makes more sense to be >>>> a note option than a per-run mode >>>> >>>> >>>> H >>>> >>>> >>>> >>>> _____________________________ >>>> From: moon soo Lee <m...@apache.org> >>>> Sent: Thursday, September 28, 2017 9:03 PM >>>> Subject: Re: Implementing run all paragraphs sequentially >>>> To: <users@zeppelin.apache.org> >>>> >>>> This is going to be really useful! >>>> >>>> >>>> Curios why do you prefer 'note option' instead of 'run option'? >>>> Could you compare their pros and cons? >>>> >>>> >>>> Thanks, >>>> moon >>>> >>>> >>>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hfre...@twitter.com> >>>> wrote: >>>> >>>> +1, our internal users at Twitter also often request this >>>> >>>> >>>> ------------------------------ >>>> *From:* Belousov Maksim Eduardovich <m.belou...@tinkoff.ru> >>>> *Sent:* Thursday, September 28, 2017 8:28:58 AM >>>> *To:* users@zeppelin.apache.org >>>> *Subject:* Implementing run all paragraphs sequentially >>>> >>>> >>>> Hello, users! >>>> >>>> >>>> At the moment our analysts often use mixes of interpreters in their >>>> notes. >>>> For example, they prepare data using %jdbc and then use it in %pyspark. >>>> Besides, they often use scheduling to make some regular reporting. And they >>>> should do something like `time.sleep()` to wait for the data from %jdbc. It >>>> doesn`t guarantee the result and doesn`t look cool. >>>> >>>> >>>> You can find early attempts to implement sequential running of all >>>> paragraphs in [1]. >>>> We are really interested in implementation of the issue [2] and are >>>> ready to solve it. >>>> >>>> >>>> It seems a good idea to discuss any requirements. >>>> My idea is to introduce note setting that defines the type of running >>>> to use (parallel or sequential) and leave "Run all" to be the only button >>>> running all the cells in the note. This will make sequential or parallel >>>> running the `note option` but not `run option`. >>>> Option will be controlled by nearby button as shown >>>> >>>> >>>> <~WRD000.jpg> >>>> >>>> >>>> >>>> >>>> >>>> >>>> For new notes the default state would be "Run sequential all", for old >>>> - "Run parallel for interpreters" >>>> >>>> >>>> We are glad to hear any thoughts. >>>> Thank you. >>>> >>>> >>>> >>>> >>>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 >>>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Maksim Belousov* >>>> >>>> >>>> >>> >>> >> >>