Hi Chen,
The fetch failures seem to be happening a lot more to people on 1.1.0 --
there's a bug tracking fetch failures at
https://issues.apache.org/jira/browse/SPARK-3633 that might be the same as
what you're seeing. Can you take a peek at that bug and if it matches what
you're observing follow
I am running the job on 500 executors, each with 8G and 1 core.
See lots of fetch failures on reduce stage, when running a simple
reduceByKey
map tasks -> 4000
reduce tasks -> 200
On Mon, Sep 22, 2014 at 12:22 PM, Chen Song wrote:
> I am using Spark 1.1.0 and have seen a lot of Fetch Failure
I am using Spark 1.1.0 and have seen a lot of Fetch Failures due to the
following exception.
java.io.IOException: sendMessageReliably failed because ack was not
received within 60 sec
at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854)