Re: Implementing run all paragraphs sequentially

Jeff Zhang Fri, 29 Sep 2017 04:53:23 -0700

Yes, the may looks a little complicated, but it is due to how we name
paragraph, not due to this approach I think. IMHO without specifying the
dependency relationship between paragraphs, it is almost impossible to
schedule paragraphs correctly.





Sotnichenko Sergey <[email protected]>于2017年9月29日周五 下午7:45写道：

> It would be very complicated to be honest to build a DAG with names like
> ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such
> names.
>
>
>
>
>
>
> *Sergey Sotnichenko *
>
>
>
>
>
> *From:* Jeff Zhang [mailto:[email protected]]
> *Sent:* Friday, September 29, 2017 2:35 PM
> *To:* [email protected]
> *Subject:* Re: Implementing run all paragraphs sequentially
>
>
>
>
>
> 'p1', 'p2' is paragraphId. Regarding the readability, we could allow user
> to set paragraph name, but this is another story, could be an improvement
> later.
>
>
>
>
>
>
>
> Partridge, Lucas (GE Aviation) <[email protected]>于2017年9月29日周五 下午
> 7:30写道：
>
> Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or
> were you using that as shorthand for the id of the paragraph?
>
> If the former then what happens if someone inserts, deletes or reorders
> paragraphs? But if the latter then the paragraph ids wouldn’t be very easy
> for someone to read and follow the dependency relationships…
>
>
>
> *From:* Jeff Zhang [mailto:[email protected]]
> *Sent:* 29 September 2017 11:58
> *To:* [email protected]
> *Subject:* EXT: Re: Implementing run all paragraphs sequentially
>
>
>
>
>
> I don't think 2 note setting (parallel/sequential) is sufficient for
> paragraph scheduling (take the spark tutorial note as an example, we should
> run the loading bank data paragraph first and then could run all the sql
> paragraph parallelly).  So the key is how we define the dependency
> relationship between paragraphs.  Paragraphs of note could build a DAG
> (directed acyclic graph). Sequential running is just one special kind of
> DAG (a linked list).
>
>
>
> I believe we discuss it before in community.  My proposal is that we could
> add attribute to the interpreter indicator of each paragraph, so that user
> can specify the paragraph's dependency (If user don't specify it, the
> default dependency is the paragraph ahead of it).  Still take the spark
> tutorial note as an example. We have 3 paragraphes, the first one will load
> bank data, and the second, third paragraph will query the data. So
> paragraph 2,3 can run parallelly but must run after paragraph 1. Then we
> need to specify their dependency in the interpreter indicator part.  Of
> course, user don't need to specify dependencies if the want to run all the
> paragraphes sequentially, because the default dependencies is the paragraph
> ahead of it.
>
>
>
> Paragraph 1.
>
>
>
> %spark
>
> // code to load bank data
>
>
>
> Paragraph 2.
>
>
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
> Paragraph 3.
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
>
>
>
>
>
>
> afancy <[email protected]>于2017年9月29日周五 下午5:35写道：
>
> +1
>
> I think this is one of the most important features. don't know why this
> requirement has been skipped.
>
>
>
> /afancy
>
> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
> [email protected]> wrote:
>
> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
>
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
>
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
>
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
>
> Option will be controlled by nearby button as shown
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
>
> Thank you.
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
> *Maksim Belousov*
>
>
>
>

Re: Implementing run all paragraphs sequentially

Reply via email to