I don't think 2 note setting (parallel/sequential) is sufficient for paragraph scheduling (take the spark tutorial note as an example, we should run the loading bank data paragraph first and then could run all the sql paragraph parallelly). So the key is how we define the dependency relationship between paragraphs. Paragraphs of note could build a DAG (directed acyclic graph). Sequential running is just one special kind of DAG (a linked list).
I believe we discuss it before in community. My proposal is that we could add attribute to the interpreter indicator of each paragraph, so that user can specify the paragraph's dependency (If user don't specify it, the default dependency is the paragraph ahead of it). Still take the spark tutorial note as an example. We have 3 paragraphes, the first one will load bank data, and the second, third paragraph will query the data. So paragraph 2,3 can run parallelly but must run after paragraph 1. Then we need to specify their dependency in the interpreter indicator part. Of course, user don't need to specify dependencies if the want to run all the paragraphes sequentially, because the default dependencies is the paragraph ahead of it. Paragraph 1. %spark // code to load bank data Paragraph 2. %spark.sql(deps=p1) // query the bank data Paragraph 3. %spark.sql(deps=p1) // query the bank data afancy <grou...@gmail.com>于2017年9月29日周五 下午5:35写道: > +1 > > I think this is one of the most important features. don't know why this > requirement has been skipped. > > /afancy > > On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich < > m.belou...@tinkoff.ru> wrote: > >> Hello, users! >> >> At the moment our analysts often use mixes of interpreters in their notes. >> >> For example, they prepare data using %jdbc and then use it in %pyspark. >> Besides, they often use scheduling to make some regular reporting. And they >> should do something like `time.sleep()` to wait for the data from %jdbc. It >> doesn`t guarantee the result and doesn`t look cool. >> >> >> >> You can find early attempts to implement sequential running of all >> paragraphs in [1]. >> >> We are really interested in implementation of the issue [2] and are ready >> to solve it. >> >> It seems a good idea to discuss any requirements. >> >> My idea is to introduce note setting that defines the type of running to >> use (parallel or sequential) and leave "Run all" to be the only button >> running all the cells in the note. This will make sequential or parallel >> running the `note option` but not `run option`. >> >> Option will be controlled by nearby button as shown >> >> [image: >> https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS] >> >> >> >> >> >> For new notes the default state would be "Run sequential all", for old - >> "Run parallel for interpreters" >> >> We are glad to hear any thoughts. >> >> Thank you. >> >> >> >> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165 >> >> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368 >> >> >> >> >> >> >> >> >> *Maksim Belousov * >> >> >> > >