Hey Gerard,

Spark Streaming should just queue the processing and not delete the block
data. There are reports of this error and I am still unable to reproduce
the problem. One workaround you can try the configuration
"spark.streaming.unpersist = false" . This stops Spark Streaming from
cleaning up old blocks. See the spark configuration page for more details.

TD


On Thu, Sep 4, 2014 at 6:33 AM, Gerard Maas <gerard.m...@gmail.com> wrote:

> Hello Sparkers,
>
> I'm currently running load tests on a Spark Streaming job. When the task
> duration increases beyond the batchDuration the job become unstable. In the
> logs I see tasks failed with the following message:
>
> Job aborted due to stage failure: Task 266.0:1 failed 4 times, most recent
> failure: Exception failure in TID 19929 on host dnode-0.hdfs.private:
> java.lang.Exception: Could not compute split, block input-2-1409835930000
> not found org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>
> I understand it's not healthy that the task execution duration is longer
> than the batchDuration, but I guess we should be able to support peaks.
>  I'm wondering whether this is this spark streaming 'graceful degradation'
> or is data being lost that that moment? What is the reason for the block
> lost and what is the recommended approach to deal with this?
>
> Thanks in advance,
>
> Gerard.
>

Reply via email to