Something similar happened to our job as well - spark streaming, YARN deployed on AWS. One of the jobs was consistently taking 10–15X longer one one machine. Same data volume, data partitioned really well, etc.
Are you running on AWS or on prem? We were assuming that one of the VMs in Amazon was flaky and decided to restart it, leading to a host of other issues (the executor on it was never recreated after the machine joined back in YARN as a healthy node…) -adrian From: Robin East Date: Wednesday, September 16, 2015 at 7:45 PM To: patcharee Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: spark performance - executor computing time Is this repeatable? Do you always get one or two executors that are 6 times as slow? It could be that some of your tasks have more work to do (maybe you are filtering some records out? If it’s always one particular worker node is there something about the machine configuration (e.g. CPU speed) that means the processing takes longer. ————————————————————————————— Robin East Spark GraphX in Action Michael S Malak and Robin East http://www.manning.com/books/spark-graphx-in-action On 15 Sep 2015, at 12:35, patcharee <patcharee.thong...@uni.no<mailto:patcharee.thong...@uni.no>> wrote: Hi, I was running a job (on Spark 1.5 + Yarn + java 8). In a stage that lookup (org.apache.spark.rdd.PairRDDFunctions.lookup(PairRDDFunctions.scala:873)) there was an executor that took the executor computing time > 6 times of median. This executor had almost the same shuffle read size and low gc time as others. What can impact the executor computing time? Any suggestions what parameters I should monitor/configure? BR, Patcharee --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>