I am using Spark 1.1.0 and have seen a lot of Fetch Failures due to the following exception.
java.io.IOException: sendMessageReliably failed because ack was not received within 60 sec at org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854) at org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:852) at scala.Option.foreach(Option.scala:236) at org.apache.spark.network.ConnectionManager$$anon$5.run(ConnectionManager.scala:852) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) I have increased spark.core.connection.ack.wait.timeout to 120 seconds. Situation is relieved but not too much. I am pretty confident it was not due to GC on executors. What could be the reason for this? Chen