Re: Implementing run all paragraphs sequentially

Herval Freire Mon, 02 Oct 2017 09:24:31 -0700

 "nice to have" isn't a very strong requirement. I strongly uggest you
really, really think about this before you start pounding an overengineered
solution to a non-issue :-)


h

On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <[email protected]>
wrote:

> Yes…
>  You have bunch of unit tests you can run in parallel where you only need
> one constructor and one cleanup.
>
> I would strongly suggest that you really, really think about this long and
> hard before you start to pound code.
> Its going to be harder to back out and fix than if you take the time to
> think thru the problem and not make a dumb mistake.
>
> On Oct 2, 2017, at 11:02 AM, Herval Freire <[email protected]> wrote:
>
> Did anyone request such a case ("running some in parallel and some in
> sequence")? I haven't seen any requests for this in the wild (nor on this
> thread), other than theoretical "what if" - which is totally fine, when it
> doesn't introduce a lot of unecessary complexity for little to no gain
> (which seems to be the case here)
>
> h
>
> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <[email protected]>
> wrote:
>
>> Because that simplicity doesn’t work.
>>
>> You will want to run some things serial and some things in parallel.
>>
>> Which is why you will need a dependency graph.
>>
>> On Oct 2, 2017, at 10:40 AM, Herval Freire <[email protected]> wrote:
>>
>> Why do you need rules and graphs and any of that to support running
>> everything sequentially or everything in parallel?
>>
>> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
>> one at a time, in the order they’re defined. If parallel, run using current
>> scheme (as many at the same time as the threadpool permits)
>>
>> Simpler and covers all cases, imo
>>
>> ------------------------------
>> *From:* Polyakov Valeriy <[email protected]>
>> *Sent:* Monday, October 2, 2017 8:24:35 AM
>> *To:* [email protected]
>> *Subject:* RE: Implementing run all paragraphs sequentially
>>
>> Let me try to summarize the discussion. Evidently, current behavior of
>> running notes does not meet actual requirements. The most important thing
>> that we need is the ability of sequential running. However, at the same
>> time we want to keep functionality of parallel running. We discussed that
>> the most suitable solution of building paragraphs` dependencies is a DAG
>> (directed acyclic graph). Therefore, surely, this kind of dependencies
>> should be defined in note and the running order should not depend on how we
>> launch it (button / scheduler / API). In this way, our objectives are to
>> implement “dependency definition engine” and to use it in “run engine”.
>> What are the options?
>> 1)      Explicit dependency definition.
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Wait
>> for …” where we can choose paragraph for which we are waiting for to start
>> execution. In case where the option is set, we start execution immediately
>> after the end of execution of selected paragraph. This pattern allows us to
>> implement full-parallel DAG running order. What are the disadvantages? All
>> of them are about the same – not easy understanding of the dependency
>> management process from the perspective of users (and probably redundancy
>> of the functionality – my personal view). At first, we should use strange
>> format of paragraph IDs, which in addition is hidden. We could come up with
>> visible and handsome paragraph ID aliases, but then it appears necessity of
>> duplication control. The second thing is in some kind of scenarios where we
>> should change existing dependencies (e.g. you need to add new paragraph
>> between one and dependent group – you have to change option “Wait for …”
>> for each paragraph in group).
>> 2)      Implicit dependency definition.
>>
>> We could take for a rule that each paragraph should wait for the end of
>> execution of ALL previous paragraphs. Then we add paragraph option “Run in
>> parallel with previous” which allows us to create paragraph groups to run
>> in parallel. It turns out that we have the way of sequential running of
>> paragraph groups – group by group in which paragraphs run in parallel. This
>> approach is much more understandable for the users, but the obvious defect
>> in comparison with “Explicit definition” is the fact that dependency graph
>> and level of parallelism are not so cool.
>> I am not sure which option (1) or (2) is correct to implement at the
>> moment. I hope to hear from product visionaries which way to choose and to
>> get approval for the start of implementation.
>> Thank you!
>>
>>
>>
>>
>>
>>
>>
>> *Valeriy Polyakov *
>>
>>
>> *From:* Michael Segel [mailto:[email protected]
>> <[email protected]>]
>> *Sent:* Saturday, September 30, 2017 4:22 PM
>> *To:* [email protected]
>> *Subject:* Re: Implementing run all paragraphs sequentially
>>
>>
>> Sorry to jump in…
>>
>>
>> If you want to run paragraphs in parallel, you are going to want to have
>> some sort of dependency graph.  Think of a common set up where you need to
>> set up common functions and imports. (setup of %spark.dep)
>>
>>
>> A good example is if your notebook is a bunch of unit tests and you need
>> to build the common tear down / set up methods to be used by the other
>> paragraphs.
>>
>>
>> If you’re going to do that, you’ll need to build out a metadata structure
>> where you can set up your dependencies  as well as add things like labels
>> beyond the ids (which only need to be unique to the given notebook. )
>>
>>
>> Just my $0.02
>>
>>
>>
>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <[email protected]> wrote:
>>
>>
>> Current behavior is as parallel as possible.
>> Run notebook button currently submits all paragraphs in a notebook into
>> each interpreter's own scheduler (FIFO, Parallel) at once. And each
>> individual scheduler of interpreter runs the paragraphs.
>>
>>
>> I think we can provide "sequential" run button for easier use, which
>> submits paragraph one and waits for finish before submit next paragraphs.
>>
>>
>> And I think sequential run button doesn't stop having more complex /
>> flexible DAG in the future?
>>
>>
>> Thanks,
>> moon
>>
>>
>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <[email protected]>
>> wrote:
>>
>> What is the current behavior?
>>
>>
>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <[email protected]>
>> wrote:
>>
>> At least in our case, the notebooks that we need to run sequentially are
>> expected to *always* run sequentially - thus it makes more sense to be a
>> note option than a per-run mode
>>
>>
>> H
>>
>>
>>
>> _____________________________
>> From: moon soo Lee <[email protected]>
>> Sent: Thursday, September 28, 2017 9:03 PM
>> Subject: Re: Implementing run all paragraphs sequentially
>> To: <[email protected]>
>>
>> This is going to be really useful!
>>
>>
>> Curios why do you prefer 'note option' instead of 'run option'?
>> Could you compare their pros and cons?
>>
>>
>> Thanks,
>> moon
>>
>>
>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <[email protected]>
>> wrote:
>>
>> +1, our internal users at Twitter also often request this
>>
>>
>> ------------------------------
>> *From:* Belousov Maksim Eduardovich <[email protected]>
>> *Sent:* Thursday, September 28, 2017 8:28:58 AM
>> *To:* [email protected]
>> *Subject:* Implementing run all paragraphs sequentially
>>
>>
>> Hello, users!
>>
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>>
>> It seems a good idea to discuss any requirements.
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>> Option will be controlled by nearby button as shown
>>
>>
>> <~WRD000.jpg>
>>
>>
>>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>>
>> We are glad to hear any thoughts.
>> Thank you.
>>
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>>
>>
>> *Maksim Belousov*
>>
>>
>>
>
>

Re: Implementing run all paragraphs sequentially

Reply via email to