Did anyone request such a case ("running some in parallel and some in
sequence")? I haven't seen any requests for this in the wild (nor on this
thread), other than theoretical "what if" - which is totally fine, when it
doesn't introduce a lot of unecessary complexity for little to no gain
(which seems to be the case here)

h

On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel <msegel_had...@hotmail.com>
wrote:

> Because that simplicity doesn’t work.
>
> You will want to run some things serial and some things in parallel.
>
> Which is why you will need a dependency graph.
>
> On Oct 2, 2017, at 10:40 AM, Herval Freire <hfre...@twitter.com> wrote:
>
> Why do you need rules and graphs and any of that to support running
> everything sequentially or everything in parallel?
>
> 3) add a “run mode” to the note. If it’s “sequential”, run the paragraphs
> one at a time, in the order they’re defined. If parallel, run using current
> scheme (as many at the same time as the threadpool permits)
>
> Simpler and covers all cases, imo
>
> ------------------------------
> *From:* Polyakov Valeriy <v.polja...@tinkoff.ru>
> *Sent:* Monday, October 2, 2017 8:24:35 AM
> *To:* users@zeppelin.apache.org
> *Subject:* RE: Implementing run all paragraphs sequentially
>
> Let me try to summarize the discussion. Evidently, current behavior of
> running notes does not meet actual requirements. The most important thing
> that we need is the ability of sequential running. However, at the same
> time we want to keep functionality of parallel running. We discussed that
> the most suitable solution of building paragraphs` dependencies is a DAG
> (directed acyclic graph). Therefore, surely, this kind of dependencies
> should be defined in note and the running order should not depend on how we
> launch it (button / scheduler / API). In this way, our objectives are to
> implement “dependency definition engine” and to use it in “run engine”.
> What are the options?
> 1)      Explicit dependency definition.
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Wait
> for …” where we can choose paragraph for which we are waiting for to start
> execution. In case where the option is set, we start execution immediately
> after the end of execution of selected paragraph. This pattern allows us to
> implement full-parallel DAG running order. What are the disadvantages? All
> of them are about the same – not easy understanding of the dependency
> management process from the perspective of users (and probably redundancy
> of the functionality – my personal view). At first, we should use strange
> format of paragraph IDs, which in addition is hidden. We could come up with
> visible and handsome paragraph ID aliases, but then it appears necessity of
> duplication control. The second thing is in some kind of scenarios where we
> should change existing dependencies (e.g. you need to add new paragraph
> between one and dependent group – you have to change option “Wait for …”
> for each paragraph in group).
> 2)      Implicit dependency definition.
>
> We could take for a rule that each paragraph should wait for the end of
> execution of ALL previous paragraphs. Then we add paragraph option “Run in
> parallel with previous” which allows us to create paragraph groups to run
> in parallel. It turns out that we have the way of sequential running of
> paragraph groups – group by group in which paragraphs run in parallel. This
> approach is much more understandable for the users, but the obvious defect
> in comparison with “Explicit definition” is the fact that dependency graph
> and level of parallelism are not so cool.
> I am not sure which option (1) or (2) is correct to implement at the
> moment. I hope to hear from product visionaries which way to choose and to
> get approval for the start of implementation.
> Thank you!
>
>
>
>
>
>
>
> *Valeriy Polyakov *
>
>
> *From:* Michael Segel [mailto:msegel_had...@hotmail.com
> <msegel_had...@hotmail.com>]
> *Sent:* Saturday, September 30, 2017 4:22 PM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: Implementing run all paragraphs sequentially
>
>
> Sorry to jump in…
>
>
> If you want to run paragraphs in parallel, you are going to want to have
> some sort of dependency graph.  Think of a common set up where you need to
> set up common functions and imports. (setup of %spark.dep)
>
>
> A good example is if your notebook is a bunch of unit tests and you need
> to build the common tear down / set up methods to be used by the other
> paragraphs.
>
>
> If you’re going to do that, you’ll need to build out a metadata structure
> where you can set up your dependencies  as well as add things like labels
> beyond the ids (which only need to be unique to the given notebook. )
>
>
> Just my $0.02
>
>
>
> On Sep 29, 2017, at 1:30 PM, moon soo Lee <m...@apache.org> wrote:
>
>
> Current behavior is as parallel as possible.
> Run notebook button currently submits all paragraphs in a notebook into
> each interpreter's own scheduler (FIFO, Parallel) at once. And each
> individual scheduler of interpreter runs the paragraphs.
>
>
> I think we can provide "sequential" run button for easier use, which
> submits paragraph one and waits for finish before submit next paragraphs.
>
>
> And I think sequential run button doesn't stop having more complex /
> flexible DAG in the future?
>
>
> Thanks,
> moon
>
>
> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <mohitja...@gmail.com> wrote:
>
> What is the current behavior?
>
>
> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <hfre...@twitter.com>
> wrote:
>
> At least in our case, the notebooks that we need to run sequentially are
> expected to *always* run sequentially - thus it makes more sense to be a
> note option than a per-run mode
>
>
> H
>
>
>
> _____________________________
> From: moon soo Lee <m...@apache.org>
> Sent: Thursday, September 28, 2017 9:03 PM
> Subject: Re: Implementing run all paragraphs sequentially
> To: <users@zeppelin.apache.org>
>
> This is going to be really useful!
>
>
> Curios why do you prefer 'note option' instead of 'run option'?
> Could you compare their pros and cons?
>
>
> Thanks,
> moon
>
>
> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <hfre...@twitter.com> wrote:
>
> +1, our internal users at Twitter also often request this
>
>
> ------------------------------
> *From:* Belousov Maksim Eduardovich <m.belou...@tinkoff.ru>
> *Sent:* Thursday, September 28, 2017 8:28:58 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Implementing run all paragraphs sequentially
>
>
> Hello, users!
>
>
> At the moment our analysts often use mixes of interpreters in their notes.
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
>
> It seems a good idea to discuss any requirements.
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
> Option will be controlled by nearby button as shown
>
>
> <~WRD000.jpg>
>
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
>
> We are glad to hear any thoughts.
> Thank you.
>
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
> *Maksim Belousov*
>
>
>

Reply via email to