Hello Sparkers,

I'm currently running load tests on a Spark Streaming job. When the task
duration increases beyond the batchDuration the job become unstable. In the
logs I see tasks failed with the following message:

Job aborted due to stage failure: Task 266.0:1 failed 4 times, most recent
failure: Exception failure in TID 19929 on host dnode-0.hdfs.private:
java.lang.Exception: Could not compute split, block input-2-1409835930000
not found org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)

I understand it's not healthy that the task execution duration is longer
than the batchDuration, but I guess we should be able to support peaks.
 I'm wondering whether this is this spark streaming 'graceful degradation'
or is data being lost that that moment? What is the reason for the block
lost and what is the recommended approach to deal with this?

Thanks in advance,

Gerard.

Reply via email to