Yes, the may looks a little complicated, but it is due to how we name paragraph, not due to this approach I think. IMHO without specifying the dependency relationship between paragraphs, it is almost impossible to schedule paragraphs correctly.
Sotnichenko Sergey <s.sotniche...@tinkoff.ru>于2017年9月29日周五 下午7:45写道: > It would be very complicated to be honest to build a DAG with names like > ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such > names. > > > > > > > *Sergey Sotnichenko * > > > > > > *From:* Jeff Zhang [mailto:zjf...@gmail.com] > *Sent:* Friday, September 29, 2017 2:35 PM > *To:* users@zeppelin.apache.org > *Subject:* Re: Implementing run all paragraphs sequentially > > > > > > 'p1', 'p2' is paragraphId. Regarding the readability, we could allow user > to set paragraph name, but this is another story, could be an improvement > later. > > > > > > > > Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2017年9月29日周五 下午 > 7:30写道: > > Interesting idea. But by ‘p1’, ‘p2’, etc did you literally mean that; or > were you using that as shorthand for the id of the paragraph? > > If the former then what happens if someone inserts, deletes or reorders > paragraphs? But if the latter then the paragraph ids wouldn’t be very easy > for someone to read and follow the dependency relationships… > > > > *From:* Jeff Zhang [mailto:zjf...@gmail.com] > *Sent:* 29 September 2017 11:58 > *To:* users@zeppelin.apache.org > *Subject:* EXT: Re: Implementing run all paragraphs sequentially > > > > > > I don't think 2 note setting (parallel/sequential) is sufficient for > paragraph scheduling (take the spark tutorial note as an example, we should > run the loading bank data paragraph first and then could run all the sql > paragraph parallelly). So the key is how we define the dependency > relationship between paragraphs. Paragraphs of note could build a DAG > (directed acyclic graph). Sequential running is just one special kind of > DAG (a linked list). > > > > I believe we discuss it before in community. My proposal is that we could > add attribute to the interpreter indicator of each paragraph, so that user > can specify the paragraph's dependency (If user don't specify it, the > default dependency is the paragraph ahead of it). Still take the spark > tutorial note as an example. We have 3 paragraphes, the first one will load > bank data, and the second, third paragraph will query the data. So > paragraph 2,3 can run parallelly but must run after paragraph 1. Then we > need to specify their dependency in the interpreter indicator part. Of > course, user don't need to specify dependencies if the want to run all the > paragraphes sequentially, because the default dependencies is the paragraph > ahead of it. > > > > Paragraph 1. > > > > %spark > > // code to load bank data > > > > Paragraph 2. > > > > %spark.sql(deps=p1) > > // query the bank data > > > > Paragraph 3. > > %spark.sql(deps=p1) > > // query the bank data > > > > > > > > > > afancy <grou...@gmail.com>于2017年9月29日周五 下午5:35写道: > > +1 > > I think this is one of the most important features. don't know why this > requirement has been skipped. > > > > /afancy > > On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich < > m.belou...@tinkoff.ru> wrote: > > Hello, users! > > At the moment our analysts often use mixes of interpreters in their notes. > > For example, they prepare data using %jdbc and then use it in %pyspark. > Besides, they often use scheduling to make some regular reporting. And they > should do something like `time.sleep()` to wait for the data from %jdbc. It > doesn`t guarantee the result and doesn`t look cool. > > > > You can find early attempts to implement sequential running of all > paragraphs in [1]. > > We are really interested in implementation of the issue [2] and are ready > to solve it. > > It seems a good idea to discuss any requirements. > > My idea is to introduce note setting that defines the type of running to > use (parallel or sequential) and leave "Run all" to be the only button > running all the cells in the note. This will make sequential or parallel > running the `note option` but not `run option`. > > Option will be controlled by nearby button as shown > > > > > > For new notes the default state would be "Run sequential all", for old - > "Run parallel for interpreters" > > We are glad to hear any thoughts. > > Thank you. > > > > [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 > > [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 > > > > > > > *Maksim Belousov* > > > >