Re: Implementing run all paragraphs sequentially

Mohit Jaggi Thu, 05 Oct 2017 16:48:24 -0700

+1 for serial run by default. 

Sent from my iPhone


> On Oct 5, 2017, at 3:36 PM, moon soo Lee <[email protected]> wrote:
> 
> I'd like to we also consider simplicity of use.
> 
> We can have two different modes, or two different run buttons for Serial or 
> Parallel run. This gives flexibility of choosing two different scheduler as a 
> benefit, but to make user understand difference between two run button, there 
> must be really good UI treatment. 
> 
> I see there're high user demands for run notebook sequentially. And i think 
> there're 3 action items in this discussion threads.
> 
> 1. Change Parallel -> Serial the current run all button behavior
> 2. Provide both Parallel and Serial run buttons with really good UI treatment.
> 3. Provides DAG 
> 
> I think 1) does not stop 2) and 3) in the future. 2) also does not stop 3) in 
> the future.
> 
> So, why don't we try 1) first and keep discuss and polish idea about 2) and 
> 3)?
> 
> 
> Thanks,
> moon
> 
>> On Mon, Oct 2, 2017 at 10:22 AM Michael Segel <[email protected]> 
>> wrote:
>> Whoa! 
>> Seems I walked in to something. 
>> 
>> Herval, 
>> 
>> What do you suggest?  A simple switch that runs everything in serial, or 
>> everything in parallel? 
>> That would be a very bad idea. 
>> 
>> I gave you an example of a class of solutions where you don’t want that 
>> behavior. 
>> E.g Unit testing where you have one setup and then run several unit tests in 
>> parallel. 
>> 
>> If that’s not enough for you… how about if you want to test 
>> producer/consumer problems?  
>> 
>> Or if you want to define classes in one paragraph but then call on them in 
>> later paragraphs. If everything runs in parallel from the start of time 0, 
>> you can’t do this.
>> 
>> 
>> So, if you want to do it right the first time… you need to establish a way 
>> to control the dependency of paragraphs. This isn’t rocket science. 
>> And frankly not that complex. 
>> 
>> BTW, this is the user list not the dev list… 
>> 
>> Just saying…  ;-)
>> 
>> 
>>> On Oct 2, 2017, at 11:24 AM, Herval Freire <[email protected]> wrote:
>>> 
>>>  "nice to have" isn't a very strong requirement. I strongly uggest you 
>>> really, really think about this before you start pounding an overengineered 
>>> solution to a non-issue :-)
>>> 
>>> h
>>> 
>>>> On Mon, Oct 2, 2017 at 9:12 AM, Michael Segel <[email protected]> 
>>>> wrote:
>>>> Yes… 
>>>>  You have bunch of unit tests you can run in parallel where you only need 
>>>> one constructor and one cleanup. 
>>>> 
>>>> I would strongly suggest that you really, really think about this long and 
>>>> hard before you start to pound code. 
>>>> Its going to be harder to back out and fix than if you take the time to 
>>>> think thru the problem and not make a dumb mistake.
>>>> 
>>>>> On Oct 2, 2017, at 11:02 AM, Herval Freire <[email protected]> wrote:
>>>>> 
>>>>> Did anyone request such a case ("running some in parallel and some in 
>>>>> sequence")? I haven't seen any requests for this in the wild (nor on this 
>>>>> thread), other than theoretical "what if" - which is totally fine, when 
>>>>> it doesn't introduce a lot of unecessary complexity for little to no gain 
>>>>> (which seems to be the case here)
>>>>> 
>>>>> h
>>>>> 
>>>>>> On Mon, Oct 2, 2017 at 8:48 AM, Michael Segel 
>>>>>> <[email protected]> wrote:
>>>>>> Because that simplicity doesn’t work. 
>>>>>> 
>>>>>> You will want to run some things serial and some things in parallel. 
>>>>>> 
>>>>>> Which is why you will need a dependency graph.
>>>>>> 
>>>>>>> On Oct 2, 2017, at 10:40 AM, Herval Freire <[email protected]> wrote:
>>>>>>> 
>>>>>>> Why do you need rules and graphs and any of that to support running 
>>>>>>> everything sequentially or everything in parallel?
>>>>>>> 
>>>>>>> 3) add a “run mode” to the note. If it’s “sequential”, run the 
>>>>>>> paragraphs one at a time, in the order they’re defined. If parallel, 
>>>>>>> run using current scheme (as many at the same time as the threadpool 
>>>>>>> permits)
>>>>>>> 
>>>>>>> Simpler and covers all cases, imo
>>>>>>> 
>>>>>>>   
>>>>>>> From: Polyakov Valeriy <[email protected]>
>>>>>>> Sent: Monday, October 2, 2017 8:24:35 AM
>>>>>>> To: [email protected]
>>>>>>> Subject: RE: Implementing run all paragraphs sequentially
>>>>>>>  
>>>>>>> Let me try to summarize the discussion. Evidently, current behavior of 
>>>>>>> running notes does not meet actual requirements. The most important 
>>>>>>> thing that we need is the ability of sequential running. However, at 
>>>>>>> the same time we want to keep functionality of parallel running. We 
>>>>>>> discussed that the most suitable solution of building paragraphs` 
>>>>>>> dependencies is a DAG (directed acyclic graph). Therefore, surely, this 
>>>>>>> kind of dependencies should be defined in note and the running order 
>>>>>>> should not depend on how we launch it (button / scheduler / API). In 
>>>>>>> this way, our objectives are to implement “dependency definition 
>>>>>>> engine” and to use it in “run engine”. What are the options?
>>>>>>> 1)      Explicit dependency definition.
>>>>>>> We could take for a rule that each paragraph should wait for the end of 
>>>>>>> execution of ALL previous paragraphs. Then we add paragraph option 
>>>>>>> “Wait for …” where we can choose paragraph for which we are waiting for 
>>>>>>> to start execution. In case where the option is set, we start execution 
>>>>>>> immediately after the end of execution of selected paragraph. This 
>>>>>>> pattern allows us to implement full-parallel DAG running order. What 
>>>>>>> are the disadvantages? All of them are about the same – not easy 
>>>>>>> understanding of the dependency management process from the perspective 
>>>>>>> of users (and probably redundancy of the functionality – my personal 
>>>>>>> view). At first, we should use strange format of paragraph IDs, which 
>>>>>>> in addition is hidden. We could come up with visible and handsome 
>>>>>>> paragraph ID aliases, but then it appears necessity of duplication 
>>>>>>> control. The second thing is in some kind of scenarios where we should 
>>>>>>> change existing dependencies (e.g. you need to add new paragraph 
>>>>>>> between one and dependent group – you have to change option “Wait for 
>>>>>>> …” for each paragraph in group).
>>>>>>> 2)      Implicit dependency definition.
>>>>>>> We could take for a rule that each paragraph should wait for the end of 
>>>>>>> execution of ALL previous paragraphs. Then we add paragraph option “Run 
>>>>>>> in parallel with previous” which allows us to create paragraph groups 
>>>>>>> to run in parallel. It turns out that we have the way of sequential 
>>>>>>> running of paragraph groups – group by group in which paragraphs run in 
>>>>>>> parallel. This approach is much more understandable for the users, but 
>>>>>>> the obvious defect in comparison with “Explicit definition” is the fact 
>>>>>>> that dependency graph and level of parallelism are not so cool.
>>>>>>> 
>>>>>>> I am not sure which option (1) or (2) is correct to implement at the 
>>>>>>> moment. I hope to hear from product visionaries which way to choose and 
>>>>>>> to get approval for the start of implementation.
>>>>>>> Thank you!
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> Valeriy Polyakov
>>>>>>> 
>>>>>>>  
>>>>>>> From: Michael Segel [mailto:[email protected]] 
>>>>>>> Sent: Saturday, September 30, 2017 4:22 PM
>>>>>>> To: [email protected]
>>>>>>> Subject: Re: Implementing run all paragraphs sequentially
>>>>>>>  
>>>>>>> Sorry to jump in…  
>>>>>>>  
>>>>>>> If you want to run paragraphs in parallel, you are going to want to 
>>>>>>> have some sort of dependency graph.  Think of a common set up where you 
>>>>>>> need to set up common functions and imports. (setup of %spark.dep) 
>>>>>>>  
>>>>>>> A good example is if your notebook is a bunch of unit tests and you 
>>>>>>> need to build the common tear down / set up methods to be used by the 
>>>>>>> other paragraphs. 
>>>>>>>  
>>>>>>> If you’re going to do that, you’ll need to build out a metadata 
>>>>>>> structure where you can set up your dependencies  as well as add things 
>>>>>>> like labels beyond the ids (which only need to be unique to the given 
>>>>>>> notebook. ) 
>>>>>>>  
>>>>>>> Just my $0.02 
>>>>>>>  
>>>>>>> On Sep 29, 2017, at 1:30 PM, moon soo Lee <[email protected]> wrote:
>>>>>>>  
>>>>>>> Current behavior is as parallel as possible.
>>>>>>> Run notebook button currently submits all paragraphs in a notebook into 
>>>>>>> each interpreter's own scheduler (FIFO, Parallel) at once. And each 
>>>>>>> individual scheduler of interpreter runs the paragraphs.
>>>>>>>  
>>>>>>> I think we can provide "sequential" run button for easier use, which 
>>>>>>> submits paragraph one and waits for finish before submit next 
>>>>>>> paragraphs.
>>>>>>>  
>>>>>>> And I think sequential run button doesn't stop having more complex / 
>>>>>>> flexible DAG in the future?
>>>>>>>  
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>  
>>>>>>> On Fri, Sep 29, 2017 at 10:08 AM Mohit Jaggi <[email protected]> 
>>>>>>> wrote:
>>>>>>> What is the current behavior?
>>>>>>>  
>>>>>>> On Fri, Sep 29, 2017 at 6:56 AM, Herval Freire <[email protected]> 
>>>>>>> wrote:
>>>>>>> At least in our case, the notebooks that we need to run sequentially 
>>>>>>> are expected to *always* run sequentially - thus it makes more sense to 
>>>>>>> be a note option than a per-run mode
>>>>>>>  
>>>>>>> H
>>>>>>>  
>>>>>>> _____________________________
>>>>>>> From: moon soo Lee <[email protected]>
>>>>>>> Sent: Thursday, September 28, 2017 9:03 PM
>>>>>>> Subject: Re: Implementing run all paragraphs sequentially
>>>>>>> To: <[email protected]>
>>>>>>> 
>>>>>>> 
>>>>>>> This is going to be really useful!
>>>>>>>  
>>>>>>> Curios why do you prefer 'note option' instead of 'run option'?
>>>>>>> Could you compare their pros and cons?
>>>>>>>  
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>  
>>>>>>> On Thu, Sep 28, 2017 at 8:32 AM Herval Freire <[email protected]> 
>>>>>>> wrote:
>>>>>>> +1, our internal users at Twitter also often request this
>>>>>>>  
>>>>>>> From: Belousov Maksim Eduardovich <[email protected]>
>>>>>>> Sent: Thursday, September 28, 2017 8:28:58 AM
>>>>>>> To: [email protected]
>>>>>>> Subject: Implementing run all paragraphs sequentially
>>>>>>>  
>>>>>>> Hello, users!
>>>>>>>  
>>>>>>> At the moment our analysts often use mixes of interpreters in their 
>>>>>>> notes.
>>>>>>> For example, they prepare data using %jdbc and then use it in %pyspark. 
>>>>>>> Besides, they often use scheduling to make some regular reporting. And 
>>>>>>> they should do something like `time.sleep()` to wait for the data from 
>>>>>>> %jdbc. It doesn`t guarantee the result and doesn`t look cool.
>>>>>>>  
>>>>>>> You can find early attempts to implement sequential running of all 
>>>>>>> paragraphs in [1].
>>>>>>> We are really interested in implementation of the issue [2] and are 
>>>>>>> ready to solve it.
>>>>>>>  
>>>>>>> It seems a good idea to discuss any requirements.
>>>>>>> My idea is to introduce note setting that defines the type of running 
>>>>>>> to use (parallel or sequential) and leave "Run all" to be the only 
>>>>>>> button running all the cells in the note. This will make sequential or 
>>>>>>> parallel running the `note option` but not `run option`.
>>>>>>> Option will be controlled by nearby button as shown
>>>>>>>  
>>>>>>> <~WRD000.jpg>
>>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>> For new notes the default state would be "Run sequential all", for old 
>>>>>>> - "Run parallel for interpreters"
>>>>>>>  
>>>>>>> We are glad to hear any thoughts.
>>>>>>> Thank you.
>>>>>>>  
>>>>>>>  
>>>>>>> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>>>>>>> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>>>>>>>  
>>>>>>>  
>>>>>>> 
>>>>>>> Maksim Belousov
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Implementing run all paragraphs sequentially

Reply via email to