>>> I suppose there is a fairly simple solution to the problem. We can use flag on paragraph which means “this paragraph should be run in parallel with previous”. Such a logic could help to create sequential-parallel running. It does not implement full-DAG capabilities, but it’s easy to understand and to use.
This can cover some cases, but can not cover all the cases I think Jeff Zhang <zjf...@gmail.com>于2017年9月29日周五 下午7:52写道: > Yes, the may looks a little complicated, but it is due to how we name > paragraph, not due to this approach I think. IMHO without specifying the > dependency relationship between paragraphs, it is almost impossible to > schedule paragraphs correctly. > > > > > Sotnichenko Sergey <s.sotniche...@tinkoff.ru>于2017年9月29日周五 下午7:45写道: > >> It would be very complicated to be honest to build a DAG with names like >> ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such >> names. >> >> >> >> >> >> >> *Sergey Sotnichenko * >> >> >> >> >> >> *From:* Jeff Zhang [mailto:zjf...@gmail.com] >> *Sent:* Friday, September 29, 2017 2:35 PM >> *To:* users@zeppelin.apache.org >> *Subject:* Re: Implementing run all paragraphs sequentially >> >> >> >> >> >> 'p1', 'p2' is paragraphId. Regarding the readability, we could allow user >> to set paragraph name, but this is another story, could be an improvement >> later. >> >> >> >> >> >> >> >> Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2017年9月29日周五 下午 >> 7:30写道: >> >> Interesting idea. But by ‘p1’, ‘p2’, etc did you literally mean that; or >> were you using that as shorthand for the id of the paragraph? >> >> If the former then what happens if someone inserts, deletes or reorders >> paragraphs? But if the latter then the paragraph ids wouldn’t be very easy >> for someone to read and follow the dependency relationships… >> >> >> >> *From:* Jeff Zhang [mailto:zjf...@gmail.com] >> *Sent:* 29 September 2017 11:58 >> *To:* users@zeppelin.apache.org >> *Subject:* EXT: Re: Implementing run all paragraphs sequentially >> >> >> >> >> >> I don't think 2 note setting (parallel/sequential) is sufficient for >> paragraph scheduling (take the spark tutorial note as an example, we should >> run the loading bank data paragraph first and then could run all the sql >> paragraph parallelly). So the key is how we define the dependency >> relationship between paragraphs. Paragraphs of note could build a DAG >> (directed acyclic graph). Sequential running is just one special kind of >> DAG (a linked list). >> >> >> >> I believe we discuss it before in community. My proposal is that we >> could add attribute to the interpreter indicator of each paragraph, so that >> user can specify the paragraph's dependency (If user don't specify it, the >> default dependency is the paragraph ahead of it). Still take the spark >> tutorial note as an example. We have 3 paragraphes, the first one will load >> bank data, and the second, third paragraph will query the data. So >> paragraph 2,3 can run parallelly but must run after paragraph 1. Then we >> need to specify their dependency in the interpreter indicator part. Of >> course, user don't need to specify dependencies if the want to run all the >> paragraphes sequentially, because the default dependencies is the paragraph >> ahead of it. >> >> >> >> Paragraph 1. >> >> >> >> %spark >> >> // code to load bank data >> >> >> >> Paragraph 2. >> >> >> >> %spark.sql(deps=p1) >> >> // query the bank data >> >> >> >> Paragraph 3. >> >> %spark.sql(deps=p1) >> >> // query the bank data >> >> >> >> >> >> >> >> >> >> afancy <grou...@gmail.com>于2017年9月29日周五 下午5:35写道: >> >> +1 >> >> I think this is one of the most important features. don't know why this >> requirement has been skipped. >> >> >> >> /afancy >> >> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich < >> m.belou...@tinkoff.ru> wrote: >> >> Hello, users! >> >> At the moment our analysts often use mixes of interpreters in their notes. >> >> For example, they prepare data using %jdbc and then use it in %pyspark. >> Besides, they often use scheduling to make some regular reporting. And they >> should do something like `time.sleep()` to wait for the data from %jdbc. It >> doesn`t guarantee the result and doesn`t look cool. >> >> >> >> You can find early attempts to implement sequential running of all >> paragraphs in [1]. >> >> We are really interested in implementation of the issue [2] and are ready >> to solve it. >> >> It seems a good idea to discuss any requirements. >> >> My idea is to introduce note setting that defines the type of running to >> use (parallel or sequential) and leave "Run all" to be the only button >> running all the cells in the note. This will make sequential or parallel >> running the `note option` but not `run option`. >> >> Option will be controlled by nearby button as shown >> >> >> >> >> >> For new notes the default state would be "Run sequential all", for old - >> "Run parallel for interpreters" >> >> We are glad to hear any thoughts. >> >> Thank you. >> >> >> >> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 >> >> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 >> >> >> >> >> >> >> *Maksim Belousov* >> >> >> >>