What behavior do you see ? If it doesn't work for you, please create a
ticket and describe the details.


afancy <grou...@gmail.com>于2018年2月21日周三 下午6:04写道:

> Hi,
>
> May I ask if this feature available? I just full from the master branch,
> but I haven't seen this implementation.
>
> Thanks
> /afancy
>
> On Sat, Oct 7, 2017 at 2:57 AM, Jianfeng (Jeff) Zhang <
> jzh...@hortonworks.com> wrote:
>
>>
>> Since almost everyone agree on to run serial by default. We could
>> implement it first. Regarding the parallel mode,  we could leave it in
>> future although personally I prefer to define DAG for note.
>>
>>
>> Best Regard,
>> Jeff Zhang
>>
>>
>> From: Michael Segel <msegel_had...@hotmail.com>
>> Reply-To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
>> Date: Friday, October 6, 2017 at 10:08 PM
>> To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
>> Subject: Re: Implementing run all paragraphs sequentially
>>
>> Guys…
>>
>> 1) You’re posting this to the user list… Isn’t this a dev question?
>>
>> 2) +1 on the run serial… but doesn’t that already exist with the “run all
>> paragraphs” button already?
>>
>> 3) -1 on a ‘run all in parallel’ button.  (Its like putting lipstick on a
>> pig.)
>>
>> Are you really going to run all of the paragraphs in parallel?  You’re
>> not going to have a paragraph that is used to set things up? Import
>> external libraries?  Define classes/functions for future paragraphs to use?
>>
>> IMHO I would much rather see a DAG where each paragraph can set their
>> dependancy… (this isn’t the right term. I’m trying to think back to how it
>> was described in NeXTStep objective-c code.)
>> Then you could set your parallel button to run in parallel but if your
>> paragraph is dependent on another, its blocked from executing until its
>> predecessor completes.
>>
>> But that’s just my $0.02
>>
>> On Oct 6, 2017, at 2:25 AM, Polyakov Valeriy <v.polja...@tinkoff.ru>
>> wrote:
>>
>> Thank you all for sharing the problem. Naman Mishra had started the
>> implementation of serial run in [1] so I propose to come back for the
>> discussion of next step (both Parallel and Serial run buttons) after [1]
>> will resolved.
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>> *Valeriy Polyakov*
>>
>> *From:* Jeff Zhang [mailto:zjf...@gmail.com <zjf...@gmail.com>]
>> *Sent:* Friday, October 06, 2017 10:14 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Re: Implementing run all paragraphs sequentially
>>
>>
>> +1 for serial run by default.  Let's leave others in future.
>>
>> Mohit Jaggi <mohitja...@gmail.com>于2017年10月6日周五 上午7:48写道:
>>
>> +1 for serial run by default.
>>
>> Sent from my iPhone
>>
>>
>> On Oct 5, 2017, at 3:36 PM, moon soo Lee <m...@apache.org> wrote:
>>
>> I'd like to we also consider simplicity of use.
>>
>> We can have two different modes, or two different run buttons for Serial
>> or Parallel run. This gives flexibility of choosing two different scheduler
>> as a benefit, but to make user understand difference between two run
>> button, there must be really good UI treatment.
>>
>> I see there're high user demands for run notebook sequentially. And i
>> think there're 3 action items in this discussion threads.
>>
>> 1. Change Parallel -> Serial the current run all button behavior
>> 2. Provide both Parallel and Serial run buttons with really good UI
>> treatment.
>> 3. Provides DAG
>>
>> I think 1) does not stop 2) and 3) in the future. 2) also does not stop
>> 3) in the future.
>>
>> So, why don't we try 1) first and keep discuss and polish idea about 2)
>> and 3)?
>>
>>
>> Thanks,
>> moon
>>
>> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <msegel_had...@hotmail.com>
>> wrote:
>>
>> Whoa!
>> Seems I walked in to something.
>>
>> Herval,
>>
>> What do you suggest?  A simple switch that runs everything in serial, or
>> everything in parallel?
>> That would be a very bad idea.
>>
>> I gave you an example of a class of solutions where you don’t want that
>> behavior.
>> E.g Unit testing where you have one setup and then run several unit tests
>> in parallel.
>>
>> If that’s not enough for you… how about if you want to test
>> producer/consumer problems?
>>
>> Or if you want to define classes in one paragraph but then call on them
>> in later paragraphs. If everything runs in parallel from the start of time
>> 0, you can’t do this.
>>
>>
>> So, if you want to do it right the first time… you need to establish a
>> way to control the dependency of paragraphs. This isn’t rocket science.
>> And frankly not that complex.
>>
>> BTW, this is the user list not the dev list…
>>
>> Just saying…  ;-)
>>
>>
>>
>> On Oct 2, 2017, at 11:24 AM, Herval Freire <hfre...@twitter.com> wrote:
>>
>>  "nice to have" isn't a very strong requirement. I strongly uggest you
>> really, really think about this before you start pounding an overengineered
>> solution to a non-issue :-)
>>
>> h
>>
>> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <msegel_had...@hotmail.com>
>> wrote:
>>
>> Yes…
>>  You have bunch of unit tests you can run in parallel where you only need
>> one constructor and one cleanup.
>>
>> I would strongly suggest that you really, really think about this long
>> and hard before you start to pound code.
>> Its going to be harder to back out and fix than if you take the time to
>> think thru the problem and not make a dumb mistake.
>>
>>
>> On Oct 2, 2017, at 11:02 AM, Herval Freire <hfre...@twitter.com> wrote:
>>
>> Did anyone request such a case ("running some in parallel and some in
>> sequence")? I haven't seen any requests for this in the wild (nor on this
>> thread), other than theoretical "what if" - which is totally fine, when it
>> doesn't introduce a lot of unecessary complexity for little to no gain
>> (which seems to be the case here)
>>
>> h
>>
>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <msegel_had...@hotmail.com>
>> wrote:
>>
>> Because that simplicity doesn’t work.
>>
>> You will want to run some things serial and some things in parallel.
>>
>> Which is why you will need a dependency graph.
>>
>>
>> On Oct 2, 2017, at 10:40 AM, Herval Freire <hfre...@twitter.com> wrote:
>>
>> Why do you need rules and graphs and any of that to support running
>> everything sequentially or everything in parallel?
>>
>> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
>> one at a time, in the order they’re defined. If parallel, run using current
>> scheme (as many at the same time as the threadpool permits)
>>
>> Simpler and covers all cases, imo
>>
>> ------------------------------
>> *From:* Polyakov Valeriy <v.polja...@tinkoff.ru>
>> *Sent:* Monday, October 2, 2017 8:24:35 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* RE: Implementing run all paragraphs sequentially
>>
>> Let me try to summarize the discussion. Evidently, current behavior of
>> running notes does not meet actual requirements. The most important thing
>> that we need is the ability of sequential running. However, at the same
>> time we want to keep functionality of parallel running. We discussed that
>> the most suitable solution of building paragraphs` dependencies is a DAG
>> (directed acyclic graph). Therefore, surely, this kind of dependencies
>> should be defined in note and the running order should not depend on how we
>> launch it (button / scheduler / API). In this way, our objectives are to
>> implement “dependency definition engine” and to use it in “run engine”.
>> What are the options?
>> 1)      Explicit dependency definition.
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Wait
>> for …” where we can choose paragraph for which we are waiting for to start
>> execution. In case where the option is set, we start execution immediately
>> after the end of execution of selected paragraph. This pattern allows us to
>> implement full-parallel DAG running order. What are the disadvantages? All
>> of them are about the same – not easy understanding of the dependency
>> management process from the perspective of users (and probably redundancy
>> of the functionality – my personal view). At first, we should use strange
>> format of paragraph IDs, which in addition is hidden. We could come up with
>> visible and handsome paragraph ID aliases, but then it appears necessity of
>> duplication control. The second thing is in some kind of scenarios where we
>> should change existing dependencies (e.g. you need to add new paragraph
>> between one and dependent group – you have to change option “Wait for …”
>> for each paragraph in group).
>> 2)      Implicit dependency definition.
>>
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Run in
>> parallel with previous” which allows us to create paragraph groups to run
>> in parallel. It turns out that we have the way of sequential running of
>> paragraph groups – group by group in which paragraphs run in parallel. This
>> approach is much more understandable for the users, but the obvious defect
>> in comparison with “Explicit definition” is the fact that dependency graph
>> and level of parallelism are not so cool.
>> I am not sure which option (1) or (2) is correct to implement at the
>> moment. I hope to hear from product visionaries which way to choose and to
>> get approval for the start of implementation.
>> Thank you!
>>
>>
>>
>>
>> *Valeriy Polyakov*
>>
>> *From:* Michael Segel [mailto:msegel_had...@hotmail.com
>> <msegel_had...@hotmail.com>]
>> *Sent:* Saturday, September 30, 2017 4:22 PM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Re: Implementing run all paragraphs sequentially
>>
>> Sorry to jump in…
>>
>> If you want to run paragraphs in parallel, you are going to want to have
>> some sort of dependency graph.  Think of a common set up where you need to
>> set up common functions and imports. (setup of %spark.dep)
>>
>> A good example is if your notebook is a bunch of unit tests and you need
>> to build the common tear down / set up methods to be used by the other
>> paragraphs.
>>
>> If you’re going to do that, you’ll need to build out a metadata structure
>> where you can set up your dependencies  as well as add things like labels
>> beyond the ids (which only need to be unique to the given notebook. )
>>
>> Just my $0.02
>>
>>
>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <m...@apache.org> wrote:
>>
>> Current behavior is as parallel as possible.
>> Run notebook button currently submits all paragraphs in a notebook into
>> each interpreter's own scheduler (FIFO, Parallel) at once. And each
>> individual scheduler of interpreter runs the paragraphs.
>>
>> I think we can provide "sequential" run button for easier use, which
>> submits paragraph one and waits for finish before submit next paragraphs.
>>
>> And I think sequential run button doesn't stop having more complex /
>> flexible DAG in the future?
>>
>> Thanks,
>> moon
>>
>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mohitja...@gmail.com>
>> wrote:
>>
>> What is the current behavior?
>>
>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hfre...@twitter.com>
>> wrote:
>>
>> At least in our case, the notebooks that we need to run sequentially are
>> expected to *always* run sequentially - thus it makes more sense to be a
>> note option than a per-run mode
>>
>> H
>>
>>
>> _____________________________
>> From: moon soo Lee <m...@apache.org>
>> Sent: Thursday, September 28, 2017 9:03 PM
>> Subject: Re: Implementing run all paragraphs sequentially
>> To: <users@zeppelin.apache.org>
>> This is going to be really useful!
>>
>> Curios why do you prefer 'note option' instead of 'run option'?
>> Could you compare their pros and cons?
>>
>> Thanks,
>> moon
>>
>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hfre...@twitter.com>
>> wrote:
>>
>> +1, our internal users at Twitter also often request this
>>
>> ------------------------------
>> *From:* Belousov Maksim Eduardovich <m.belou...@tinkoff.ru>
>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>> *To:* users@zeppelin.apache.org
>> *Subject:* Implementing run all paragraphs sequentially
>>
>> Hello, users!
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>> It seems a good idea to discuss any requirements.
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>> Option will be controlled by nearby button as shown
>>
>> <~WRD000.jpg>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>> We are glad to hear any thoughts.
>> Thank you.
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>> *Maksim Belousov*
>>
>>
>>
>

Reply via email to