Hi,
 I have a Spark streaming application that is pumping data from Kafka into
HDFS and Elasticsearch. The application is running on a Spark Standalone
cluster (in client mode).

Whenever one of the executors fails (or is killed), new executor for the
application is spawned on every node in the cluster.
Since the executors have a pretty high memory allocation, this typically
leads to a cascading failure.

The application is using updateStateByKey and checkpointing (on HDFS). I
was able to reproduce the issue with the application on Spark 1.6.0 and
1.6.1.

I tried to reproduce the behavior with the streaming.HdfsWordCount example,
but the executor restart worked fine there.

Thanks,
V.

Reply via email to