I am using Spark 1.1.0 and have seen a lot of Fetch Failures due to the
following exception.

java.io.IOException: sendMessageReliably failed because ack was not
received within 60 sec
        at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:854)
        at
org.apache.spark.network.ConnectionManager$$anon$5$$anonfun$run$15.apply(ConnectionManager.scala:852)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.network.ConnectionManager$$anon$5.run(ConnectionManager.scala:852)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505)

I have increased spark.core.connection.ack.wait.timeout to 120 seconds.
Situation is relieved but not too much. I am pretty confident it was not
due to GC on executors. What could be the reason for this?

Chen

Reply via email to