I don't think 2 note setting (parallel/sequential) is sufficient for
paragraph scheduling (take the spark tutorial note as an example, we should
run the loading bank data paragraph first and then could run all the sql
paragraph parallelly).  So the key is how we define the dependency
relationship between paragraphs.  Paragraphs of note could build a DAG
(directed acyclic graph). Sequential running is just one special kind of
DAG (a linked list).

I believe we discuss it before in community.  My proposal is that we could
add attribute to the interpreter indicator of each paragraph, so that user
can specify the paragraph's dependency (If user don't specify it, the
default dependency is the paragraph ahead of it).  Still take the spark
tutorial note as an example. We have 3 paragraphes, the first one will load
bank data, and the second, third paragraph will query the data. So
paragraph 2,3 can run parallelly but must run after paragraph 1. Then we
need to specify their dependency in the interpreter indicator part.  Of
course, user don't need to specify dependencies if the want to run all the
paragraphes sequentially, because the default dependencies is the paragraph
ahead of it.

Paragraph 1.

%spark
// code to load bank data

Paragraph 2.

%spark.sql(deps=p1)
// query the bank data

Paragraph 3.
%spark.sql(deps=p1)
// query the bank data




afancy <grou...@gmail.com>于2017年9月29日周五 下午5:35写道:

> +1
>
> I think this is one of the most important features. don't know why this
> requirement has been skipped.
>
> /afancy
>
> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
> m.belou...@tinkoff.ru> wrote:
>
>> Hello, users!
>>
>> At the moment our analysts often use mixes of interpreters in their notes.
>>
>> For example, they prepare data using %jdbc and then use it in %pyspark.
>> Besides, they often use scheduling to make some regular reporting. And they
>> should do something like `time.sleep()` to wait for the data from %jdbc. It
>> doesn`t guarantee the result and doesn`t look cool.
>>
>>
>>
>> You can find early attempts to implement sequential running of all
>> paragraphs in [1].
>>
>> We are really interested in implementation of the issue [2] and are ready
>> to solve it.
>>
>> It seems a good idea to discuss any requirements.
>>
>> My idea is to introduce note setting that defines the type of running to
>> use (parallel or sequential) and leave "Run all" to be the only button
>> running all the cells in the note. This will make sequential or parallel
>> running the `note option` but not `run option`.
>>
>> Option will be controlled by nearby button as shown
>>
>> [image:
>> https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]
>>
>>
>>
>>
>>
>> For new notes the default state would be "Run sequential all", for old -
>> "Run parallel for interpreters"
>>
>> We are glad to hear any thoughts.
>>
>> Thank you.
>>
>>
>>
>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>
>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>
>>
>>
>>
>>
>>
>>
>>
>> *Maksim Belousov *
>>
>>
>>
>
>

Reply via email to