Something similar happened to our job as well - spark streaming, YARN deployed 
on AWS.
One of the jobs was consistently taking 10–15X longer one one machine. Same 
data volume, data partitioned really well, etc.

Are you running on AWS or on prem?

We were assuming that one of the VMs in Amazon was flaky and decided to restart 
it, leading to a host of other issues (the executor on it was never recreated 
after the machine joined back in YARN as a healthy node…)

-adrian

From: Robin East
Date: Wednesday, September 16, 2015 at 7:45 PM
To: patcharee
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re: spark performance - executor computing time

Is this repeatable? Do you always get one or two executors that are 6 times as 
slow? It could be that some of your tasks have more work to do (maybe you are 
filtering some records out? If it’s always one particular worker node is there 
something about the machine configuration (e.g. CPU speed) that means the 
processing takes longer.

—————————————————————————————
Robin East
Spark GraphX in Action Michael S Malak and Robin East
http://www.manning.com/books/spark-graphx-in-action

On 15 Sep 2015, at 12:35, patcharee 
<patcharee.thong...@uni.no<mailto:patcharee.thong...@uni.no>> wrote:

Hi,

I was running a job (on Spark 1.5 + Yarn + java 8). In a stage that lookup 
(org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:873)) 
there was an executor that took the executor computing time > 6 times of 
median. This executor had almost the same shuffle read size and low gc time as 
others.

What can impact the executor computing time? Any suggestions what parameters I 
should monitor/configure?

BR,
Patcharee



---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>


Reply via email to