I’m getting the same issue on Spark 1.2.0. Despite having set
“spark.core.connection.ack.wait.timeout” in spark-defaults.conf and verified in
the job UI (port 4040) environment tab, I still get the “no heartbeat in 60
seconds” error.
spark.core.connection.ack.wait.timeout=3600
15/01/22 07:29:
Darin,
You might want to increase these config options also:
spark.akka.timeout 300
spark.storage.blockManagerSlaveTimeoutMs 30
On Thu, Nov 13, 2014 at 11:31 AM, Darin McBeath wrote:
> For one of my Spark jobs, my workers/executors are dying and leaving the
> cluster.
>
> On the master, I
Hi Darin,
In our case, we were getting the error gue to long GC pauses in our app.
Fixing the underlying code helped us remove this error. This is also
mentioned as point 1 in the link below:
http://mail-archives.apache.org/mod_mbox/spark-user/201409.mbox/%3cca+-p3ah5aamgtke6viycwb24ohsnmaqm1q9x5
For one of my Spark jobs, my workers/executors are dying and leaving the
cluster.
On the master, I see something like the following in the log file. I'm
surprised to see the '60' seconds in the master log below because I explicitly
set it to '600' (or so I thought) in my spark job (see below).