Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-06 Thread Ryan Blue
I agree with Reynold. We don't need to use a separate pool, which would
have the problem you raised about FIFO. We just need to do the planning
outside of the scheduler loop. The call site thread sounds like a
reasonable place to me.

On Mon, Mar 5, 2018 at 12:56 PM, Reynold Xin  wrote:

> Rather than using a separate thread pool, perhaps we can just move the
> prep code to the call site thread?
>
>
> On Sun, Mar 4, 2018 at 11:15 PM, Ajith shetty 
> wrote:
>
>> DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted
>> events has to be processed as DAGSchedulerEventProcessLoop is single
>> threaded and it will block other tasks in queue like TaskCompletion.
>>
>> The JobSubmitted event is time consuming depending on the nature of the
>> job (Example: calculating parent stage dependencies, shuffle dependencies,
>> partitions) and thus it blocks all the events to be processed.
>>
>>
>>
>> I see multiple JIRA referring to this behavior
>>
>> https://issues.apache.org/jira/browse/SPARK-2647
>>
>> https://issues.apache.org/jira/browse/SPARK-4961
>>
>>
>>
>> Similarly in my cluster some jobs partition calculation is time consuming
>> (Similar to stack at SPARK-2647) hence it slows down the spark
>> DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even
>> if its tasks are finished within seconds, as TaskCompletion Events are
>> processed at a slower rate due to blockage.
>>
>>
>>
>> I think we can split a JobSubmitted Event into 2 events
>>
>> Step 1. JobSubmittedPreperation - Runs in separate thread on
>> JobSubmission, this will involve steps org.apache.spark.scheduler.DAG
>> Scheduler#createResultStage
>>
>> Step 2. JobSubmittedExecution - If Step 1 is success, fire an event to
>> DAGSchedulerEventProcessLoop and let it process output of
>> org.apache.spark.scheduler.DAGScheduler#createResultStage
>>
>>
>>
>> I can see the effect of doing this may be that Job Submissions may not be
>> FIFO depending on how much time Step 1 mentioned above is going to consume.
>>
>>
>>
>> Does above solution suffice for the problem described? And is there any
>> other side effect of this solution?
>>
>>
>>
>> Regards
>>
>> Ajith
>>
>
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: Silencing messages from Ivy when calling spark-submit

2018-03-06 Thread Bryan Cutler
Cool, hopefully it will work.  I don't know what setting that would be
though, but it seems like it might be somewhere under here
http://ant.apache.org/ivy/history/latest-milestone/settings/outputters.html.
It's pretty difficult to sort through the docs, and I often found myself
looking at the source to understand some settings.  If you happen to figure
out the answer, please report back here.  I'm sure others would find it
useful too.

Bryan

On Mon, Mar 5, 2018 at 3:50 PM, Nicholas Chammas  wrote:

> Oh, I didn't know about that. I think that will do the trick.
>
> Would you happen to know what setting I need? I'm looking here
> , but
> it's a bit overwhelming. I'm basically looking for a way to set the overall
> Ivy log level to WARN or higher.
>
> Nick
>
> On Mon, Mar 5, 2018 at 2:11 PM Bryan Cutler  wrote:
>
>> Hi Nick,
>>
>> Not sure about changing the default to warnings only because I think some
>> might find the resolution output useful, but you can specify your own ivy
>> settings file with "spark.jars.ivySettings" to point to your
>> ivysettings.xml file.  Would that work for you to configure it there?
>>
>> Bryan
>>
>> On Mon, Mar 5, 2018 at 8:20 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> I couldn’t get an answer anywhere else, so I thought I’d ask here.
>>>
>>> Is there a way to silence the messages that come from Ivy when you call
>>> spark-submit with --packages? (For the record, I asked this question on
>>> Stack Overflow .)
>>>
>>> Would it be a good idea to configure Ivy by default to only output
>>> warnings or errors?
>>>
>>> Nick
>>> ​
>>>
>>
>>


Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-06 Thread Shivaram Venkataraman
The problem with doing work in the callsite thread is that there are a
number of data structures that are updated during job submission and
these data structures are guarded by the event loop ensuring only one
thread accesses them.  I dont think there is a very easy fix for this
given the structure of the DAGScheduler.

Thanks
Shivaram

On Tue, Mar 6, 2018 at 8:53 AM, Ryan Blue  wrote:
> I agree with Reynold. We don't need to use a separate pool, which would have
> the problem you raised about FIFO. We just need to do the planning outside
> of the scheduler loop. The call site thread sounds like a reasonable place
> to me.
>
> On Mon, Mar 5, 2018 at 12:56 PM, Reynold Xin  wrote:
>>
>> Rather than using a separate thread pool, perhaps we can just move the
>> prep code to the call site thread?
>>
>>
>> On Sun, Mar 4, 2018 at 11:15 PM, Ajith shetty 
>> wrote:
>>>
>>> DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted
>>> events has to be processed as DAGSchedulerEventProcessLoop is single
>>> threaded and it will block other tasks in queue like TaskCompletion.
>>>
>>> The JobSubmitted event is time consuming depending on the nature of the
>>> job (Example: calculating parent stage dependencies, shuffle dependencies,
>>> partitions) and thus it blocks all the events to be processed.
>>>
>>>
>>>
>>> I see multiple JIRA referring to this behavior
>>>
>>> https://issues.apache.org/jira/browse/SPARK-2647
>>>
>>> https://issues.apache.org/jira/browse/SPARK-4961
>>>
>>>
>>>
>>> Similarly in my cluster some jobs partition calculation is time consuming
>>> (Similar to stack at SPARK-2647) hence it slows down the spark
>>> DAGSchedulerEventProcessLoop which results in user jobs to slowdown, even if
>>> its tasks are finished within seconds, as TaskCompletion Events are
>>> processed at a slower rate due to blockage.
>>>
>>>
>>>
>>> I think we can split a JobSubmitted Event into 2 events
>>>
>>> Step 1. JobSubmittedPreperation - Runs in separate thread on
>>> JobSubmission, this will involve steps
>>> org.apache.spark.scheduler.DAGScheduler#createResultStage
>>>
>>> Step 2. JobSubmittedExecution - If Step 1 is success, fire an event to
>>> DAGSchedulerEventProcessLoop and let it process output of
>>> org.apache.spark.scheduler.DAGScheduler#createResultStage
>>>
>>>
>>>
>>> I can see the effect of doing this may be that Job Submissions may not be
>>> FIFO depending on how much time Step 1 mentioned above is going to consume.
>>>
>>>
>>>
>>> Does above solution suffice for the problem described? And is there any
>>> other side effect of this solution?
>>>
>>>
>>>
>>> Regards
>>>
>>> Ajith
>>
>>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-06 Thread Reynold Xin
It's mostly just hash maps from some ids to some state, and those can be
replaced just with concurrent hash maps?

(I haven't actually looked at code and am just guessing based on
recollection.)

On Tue, Mar 6, 2018 at 10:42 AM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> The problem with doing work in the callsite thread is that there are a
> number of data structures that are updated during job submission and
> these data structures are guarded by the event loop ensuring only one
> thread accesses them.  I dont think there is a very easy fix for this
> given the structure of the DAGScheduler.
>
> Thanks
> Shivaram
>
> On Tue, Mar 6, 2018 at 8:53 AM, Ryan Blue 
> wrote:
> > I agree with Reynold. We don't need to use a separate pool, which would
> have
> > the problem you raised about FIFO. We just need to do the planning
> outside
> > of the scheduler loop. The call site thread sounds like a reasonable
> place
> > to me.
> >
> > On Mon, Mar 5, 2018 at 12:56 PM, Reynold Xin 
> wrote:
> >>
> >> Rather than using a separate thread pool, perhaps we can just move the
> >> prep code to the call site thread?
> >>
> >>
> >> On Sun, Mar 4, 2018 at 11:15 PM, Ajith shetty 
> >> wrote:
> >>>
> >>> DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted
> >>> events has to be processed as DAGSchedulerEventProcessLoop is single
> >>> threaded and it will block other tasks in queue like TaskCompletion.
> >>>
> >>> The JobSubmitted event is time consuming depending on the nature of the
> >>> job (Example: calculating parent stage dependencies, shuffle
> dependencies,
> >>> partitions) and thus it blocks all the events to be processed.
> >>>
> >>>
> >>>
> >>> I see multiple JIRA referring to this behavior
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-2647
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-4961
> >>>
> >>>
> >>>
> >>> Similarly in my cluster some jobs partition calculation is time
> consuming
> >>> (Similar to stack at SPARK-2647) hence it slows down the spark
> >>> DAGSchedulerEventProcessLoop which results in user jobs to slowdown,
> even if
> >>> its tasks are finished within seconds, as TaskCompletion Events are
> >>> processed at a slower rate due to blockage.
> >>>
> >>>
> >>>
> >>> I think we can split a JobSubmitted Event into 2 events
> >>>
> >>> Step 1. JobSubmittedPreperation - Runs in separate thread on
> >>> JobSubmission, this will involve steps
> >>> org.apache.spark.scheduler.DAGScheduler#createResultStage
> >>>
> >>> Step 2. JobSubmittedExecution - If Step 1 is success, fire an event to
> >>> DAGSchedulerEventProcessLoop and let it process output of
> >>> org.apache.spark.scheduler.DAGScheduler#createResultStage
> >>>
> >>>
> >>>
> >>> I can see the effect of doing this may be that Job Submissions may not
> be
> >>> FIFO depending on how much time Step 1 mentioned above is going to
> consume.
> >>>
> >>>
> >>>
> >>> Does above solution suffice for the problem described? And is there any
> >>> other side effect of this solution?
> >>>
> >>>
> >>>
> >>> Regards
> >>>
> >>> Ajith
> >>
> >>
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>