From: "Sanders, Isaac B"
Date: 01/25/2016 8:59 AM (GMT-05:00)
To: Ted Yu
Cc: Darren Govoni , Renu Yadav , Muthu
Jayakumar , user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
Is the thread dump the stack trace you are talking about? If so, I will see if
I can
d. This suggest a serious
>>> fundamental scaling problem.
>>>
>>> Workers have plenty of resources.
>>>
>>>
>>>
>>> Sent from my Verizon Wireless 4G LTE smartphone
>>>
>>>
>>> Original message
>>
:yren...@gmail.com>>
Cc: Darren Govoni mailto:dar...@ontrenet.com>>, Muthu
Jayakumar mailto:bablo...@gmail.com>>, Ted Yu
mailto:yuzhih...@gmail.com>>,
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay
I am not getting anywhere
Renu Yadav
> Cc: Darren Govoni , Muthu Jayakumar
> , Ted Yu , user@spark.apache.org
> Subject: Re: 10hrs of Scheduler Delay
>
> I am not getting anywhere with any of the suggestions so far. :(
>
> Trying some more outlets, I will share any solution I find.
>
> - Isaac
u Yadav
Cc: Darren Govoni , Muthu Jayakumar ,
Ted Yu , user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
I am not getting anywhere with any of the suggestions so far. :(
Trying some more outlets, I will share any solution I find.
- Isaac
On Jan 23, 2016, at 1:48 A
ache.org<mailto:user@spark.apache.org>
Subject: Re: 10hrs of Scheduler Delay
Does increasing the number of partition helps? You could try out something 3
times what you currently have.
Another trick i used was to partition the problem into multiple dataframes and
run them sequentially a
gt;
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
> Original message
> From: Muthu Jayakumar
> Date: 01/22/2016 3:50 PM (GMT-05:00)
> To: Darren Govoni , "Sanders, Isaac B" <
> sande...@rose-hulman.edu>, Ted Yu
> Cc: user@spark.a
n Wireless 4G LTE smartphone
>
>
> Original message
> From: Muthu Jayakumar
> Date: 01/22/2016 3:50 PM (GMT-05:00)
> To: Darren Govoni , "Sanders, Isaac B" <
> sande...@rose-hulman.edu>, Ted Yu
> Cc: user@spark.apache.org
> Subject: Re: 10hrs of Scheduler
)
To: Darren Govoni , "Sanders, Isaac B"
, Ted Yu
Cc: user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
Does increasing the number of partition helps? You could try out something 3
times what you currently have. Another trick i used was to partition the
problem int
nders, Isaac B"
> Date: 01/21/2016 11:18 PM (GMT-05:00)
> To: Ted Yu
> Cc: user@spark.apache.org
> Subject: Re: 10hrs of Scheduler Delay
>
> I have run the driver on a smaller dataset (k=2, n=5000) and it worked
> quickly and didn’t hang like this. This dataset is closer
Cc: user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am
using more resources on this one.
- Isaac
On Jan 21, 2016, at 11:06
I have run the driver on a smaller dataset (k=2, n=5000) and it worked quickly
and didn’t hang like this. This dataset is closer to k=10, n=4.4m, but I am
using more resources on this one.
- Isaac
On Jan 21, 2016, at 11:06 PM, Ted Yu
mailto:yuzhih...@gmail.com>> wrote:
You may have seen the f
You may have seen the following on github page:
Latest commit 50fdf0e on Feb 22, 2015
That was 11 months ago.
Can you search for similar algorithm which runs on Spark and is newer ?
If nothing found, consider running the tests coming from the project to
determine whether the delay is intrinsic
That thread seems to be moving, it oscillates between a few different traces…
Maybe it is working. It seems odd that it would take that long.
This is 3rd party code, and after looking at some of it, I think it might not
be as Spark-y as it could be.
I linked it below. I don’t know a lot about s
You may have noticed the following - did this indicate prolonged
computation in your code ?
org.apache.commons.math3.util.MathArrays.distance(MathArrays.java:205)
org.apache.commons.math3.ml.distance.EuclideanDistance.compute(EuclideanDistance.java:34)
org.alitouka.spark.dbscan.spatial.DistanceCal
d Yu
Date: 01/21/2016 7:44 PM (GMT-05:00)
To: "Sanders, Isaac B"
Cc: user@spark.apache.org
Subject: Re: 10hrs of Scheduler Delay
Looks like you were running on YARN.
What hadoop version are you using ?
Can you capture a few stack traces of the AppMaster during the delay and
past
Hadoop is: HDP 2.3.2.0-2950
Here is a gist (pastebin) of my versions en masse and a stacktrace:
https://gist.github.com/isaacsanders/2e59131758469097651b
Thanks
On Jan 21, 2016, at 7:44 PM, Ted Yu
mailto:yuzhih...@gmail.com>> wrote:
Looks like you were running on YARN.
What hadoop version ar
Looks like you were running on YARN.
What hadoop version are you using ?
Can you capture a few stack traces of the AppMaster during the delay and
pastebin them ?
Thanks
On Thu, Jan 21, 2016 at 8:08 AM, Sanders, Isaac B
wrote:
> The Spark Version is 1.4.1
>
> The logs are full of standard fair
The Spark Version is 1.4.1
The logs are full of standard fair, nothing like an exception or even
interesting [INFO] lines.
Here is the script I am using:
https://gist.github.com/isaacsanders/660f480810fbc07d4df2
Thanks
Isaac
On Jan 21, 2016, at 11:03 AM, Ted Yu
mailto:yuzhih...@gmail.com>> w
Can you provide a bit more information ?
command line for submitting Spark job
version of Spark
anything interesting from driver / executor logs ?
Thanks
On Thu, Jan 21, 2016 at 7:35 AM, Sanders, Isaac B
wrote:
> Hey all,
>
> I am a CS student in the United States working on my senior thesis.
Hey all,
I am a CS student in the United States working on my senior thesis.
My thesis uses Spark, and I am encountering some trouble.
I am using https://github.com/alitouka/spark_dbscan, and to determine
parameters, I am using the utility class they supply,
org.alitouka.spark.dbscan.explorato
21 matches
Mail list logo