Re: 10hrs of Scheduler Delay

Sanders, Isaac B Mon, 25 Jan 2016 06:00:18 -0800

Is the thread dump the stack trace you are talking about? If so, I will see if 
I can capture the few different stages I have seen it in.


Thanks for the help, I was able to do it for 0.1% of my data. I will create the 
JIRA.

Thanks,
Isaac

On Jan 25, 2016, at 8:51 AM, Ted Yu 
<[email protected]<mailto:[email protected]>> wrote:

Opening a JIRA is fine.

See if you can capture stack trace during the hung stage and attach to JIRA so 
that we have more clue.

Thanks

On Jan 25, 2016, at 4:25 AM, Darren Govoni 
<[email protected]<mailto:[email protected]>> wrote:

Probably we should open a ticket for this.
There's definitely a deadlock situation occurring in spark under certain 
conditions.

The only clue I have is it always happens on the last stage. And it does seem 
sensitive to scale. If my job has 300mb of data I'll see the deadlock. But if I 
only run 10mb of it it will succeed. This suggest a serious fundamental scaling 
problem.

Workers have plenty of resources.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: "Sanders, Isaac B" 
<[email protected]<mailto:[email protected]>>
Date: 01/24/2016 2:54 PM (GMT-05:00)
To: Renu Yadav <[email protected]<mailto:[email protected]>>
Cc: Darren Govoni <[email protected]<mailto:[email protected]>>, Muthu 
Jayakumar <[email protected]<mailto:[email protected]>>, Ted Yu 
<[email protected]<mailto:[email protected]>>, 
[email protected]<mailto:[email protected]>
Subject: Re: 10hrs of Scheduler Delay

I am not getting anywhere with any of the suggestions so far. :(

Trying some more outlets, I will share any solution I find.

- Isaac

On Jan 23, 2016, at 1:48 AM, Renu Yadav 
<[email protected]<mailto:[email protected]>> wrote:

If you turn on spark.speculation on then that might help. it worked  for me

On Sat, Jan 23, 2016 at 3:21 AM, Darren Govoni 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the tip. I will try it. But this is the kind of thing spark is 
supposed to figure out and handle. Or at least not get stuck forever.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Muthu Jayakumar <[email protected]<mailto:[email protected]>>
Date: 01/22/2016 3:50 PM (GMT-05:00)
To: Darren Govoni <[email protected]<mailto:[email protected]>>, "Sanders, 
Isaac B" <[email protected]<mailto:[email protected]>>, Ted Yu 
<[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: 10hrs of Scheduler Delay

Does increasing the number of partition helps? You could try out something 3 
times what you currently have.
Another trick i used was to partition the problem into multiple dataframes and 
run them sequentially and persistent the result and then run a union on the 
results.

Hope this helps.

On Fri, Jan 22, 2016, 3:48 AM Darren Govoni 
<[email protected]<mailto:[email protected]>> wrote:
Me too. I had to shrink my dataset to get it to work. For us at least Spark 
seems to have scaling issues.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: "Sanders, Isaac B" 
<[email protected]<mailto:[email protected]>>
Date: 01/21/2016 11:18 PM (GMT-05:00)
To: Ted Yu <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: 10hrs of Scheduler Delay

I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly 
and didn't hang like this. This dataset is closer to k=10, n=4.4m, but I am 
using more resources on this one.

- Isaac

On Jan 21, 2016, at 11:06 PM, Ted Yu 
<[email protected]<mailto:[email protected]>> wrote:

You may have seen the following on github page:

Latest commit 50fdf0e  on Feb 22, 2015

That was 11 months ago.

Can you search for similar algorithm which runs on Spark and is newer ?

If nothing found, consider running the tests coming from the project to 
determine whether the delay is intrinsic.

Cheers

On Thu, Jan 21, 2016 at 7:46 PM, Sanders, Isaac B 
<[email protected]<mailto:[email protected]>> wrote:
That thread seems to be moving, it oscillates between a few different traces... 
Maybe it is working. It seems odd that it would take that long.

This is 3rd party code, and after looking at some of it, I think it might not 
be as Spark-y as it could be.

I linked it below. I don't know a lot about spark, so it might be fine, but I 
have my suspicions.

https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryAnalysis/DistanceToNearestNeighborDriver.scala

- Isaac

On Jan 21, 2016, at 10:08 PM, Ted Yu 
<[email protected]<mailto:[email protected]>> wrote:

You may have noticed the following - did this indicate prolonged computation in 
your code ?

Re: 10hrs of Scheduler Delay

Reply via email to