Re: spark time out

2014-09-23 Thread Andrew Ash
Hi Chen, The fetch failures seem to be happening a lot more to people on 1.1.0 -- there's a bug tracking fetch failures at https://issues.apache.org/jira/browse/SPARK-3633 that might be the same as what you're seeing. Can you take a peek at that bug and if it matches what you're observing follow

Re: spark time out

2014-09-23 Thread Chen Song
I am running the job on 500 executors, each with 8G and 1 core. See lots of fetch failures on reduce stage, when running a simple reduceByKey map tasks -> 4000 reduce tasks -> 200 On Mon, Sep 22, 2014 at 12:22 PM, Chen Song wrote: > I am using Spark 1.1.0 and have seen a lot of Fetch Failure

spark time out

2014-09-22 Thread Chen Song
I am using Spark 1.1.0 and have seen a lot of Fetch Failures due to the following exception. java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854)