I suppose there is a fairly simple solution to the problem. We can use flag on paragraph which means “this paragraph should be run in parallel with previous”. Such a logic could help to create sequential-parallel running. It does not implement full-DAG capabilities, but it’s easy to understand and to use.
Valeriy Polyakov From: Sotnichenko Sergey [mailto:s.sotniche...@tinkoff.ru] Sent: Friday, September 29, 2017 2:45 PM To: users@zeppelin.apache.org Subject: RE: Implementing run all paragraphs sequentially It would be very complicated to be honest to build a DAG with names like ‘20170929-143857_1744629322’. Let’s imagine we have 20 paragraphs with such names. Sergey Sotnichenko From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Friday, September 29, 2017 2:35 PM To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org> Subject: Re: Implementing run all paragraphs sequentially 'p1', 'p2' is paragraphId. Regarding the readability, we could allow user to set paragraph name, but this is another story, could be an improvement later. Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com<mailto:lucas.partri...@ge.com>>于2017年9月29日周五 下午7:30写道: Interesting idea. But by ‘p1’, ‘p2’, etc did you literally mean that; or were you using that as shorthand for the id of the paragraph? If the former then what happens if someone inserts, deletes or reorders paragraphs? But if the latter then the paragraph ids wouldn’t be very easy for someone to read and follow the dependency relationships… From: Jeff Zhang [mailto:zjf...@gmail.com<mailto:zjf...@gmail.com>] Sent: 29 September 2017 11:58 To: users@zeppelin.apache.org<mailto:users@zeppelin.apache.org> Subject: EXT: Re: Implementing run all paragraphs sequentially I don't think 2 note setting (parallel/sequential) is sufficient for paragraph scheduling (take the spark tutorial note as an example, we should run the loading bank data paragraph first and then could run all the sql paragraph parallelly). So the key is how we define the dependency relationship between paragraphs. Paragraphs of note could build a DAG (directed acyclic graph). Sequential running is just one special kind of DAG (a linked list). I believe we discuss it before in community. My proposal is that we could add attribute to the interpreter indicator of each paragraph, so that user can specify the paragraph's dependency (If user don't specify it, the default dependency is the paragraph ahead of it). Still take the spark tutorial note as an example. We have 3 paragraphes, the first one will load bank data, and the second, third paragraph will query the data. So paragraph 2,3 can run parallelly but must run after paragraph 1. Then we need to specify their dependency in the interpreter indicator part. Of course, user don't need to specify dependencies if the want to run all the paragraphes sequentially, because the default dependencies is the paragraph ahead of it. Paragraph 1. %spark // code to load bank data Paragraph 2. %spark.sql(deps=p1) // query the bank data Paragraph 3. %spark.sql(deps=p1) // query the bank data afancy <grou...@gmail.com<mailto:grou...@gmail.com>>于2017年9月29日周五 下午5:35写道: +1 I think this is one of the most important features. don't know why this requirement has been skipped. /afancy On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <m.belou...@tinkoff.ru<mailto:m.belou...@tinkoff.ru>> wrote: Hello, users! At the moment our analysts often use mixes of interpreters in their notes. For example, they prepare data using %jdbc and then use it in %pyspark. Besides, they often use scheduling to make some regular reporting. And they should do something like `time.sleep()` to wait for the data from %jdbc. It doesn`t guarantee the result and doesn`t look cool. You can find early attempts to implement sequential running of all paragraphs in [1]. We are really interested in implementation of the issue [2] and are ready to solve it. It seems a good idea to discuss any requirements. My idea is to introduce note setting that defines the type of running to use (parallel or sequential) and leave "Run all" to be the only button running all the cells in the note. This will make sequential or parallel running the `note option` but not `run option`. Option will be controlled by nearby button as shown For new notes the default state would be "Run sequential all", for old - "Run parallel for interpreters" We are glad to hear any thoughts. Thank you. [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 Maksim Belousov