I have been running into this as well, but I am using S3 for checkpointing
so I chalked it up to network partitioning with s3-isnt-hdfs as my storage
location. But it seems that you are indeed using hdfs, so I wonder if there
is another underlying issue.

On Wed, Mar 28, 2018 at 8:21 AM, Jone Zhang <joyoungzh...@gmail.com> wrote:

> The spark streaming job running for a few days,then fail as below
> What is the possible reason?
>
> *18/03/25 07:58:37 ERROR yarn.ApplicationMaster: User class threw
> exception: org.apache.spark.SparkException: Job aborted due to stage
> failure: Task 16 in stage 80018.0 failed 4 times, most recent failure: Lost
> task 16.3 in stage 80018.0 (TID 7318859, 10.196.155.153):
> java.io.FileNotFoundException:
> /data/hadoop_tmp/nm-local-dir/usercache/mqq/appcache/application_1521712903594_6152/blockmgr-7aa2fb13-25d8-4145-a704-7861adfae4ec/22/shuffle_40009_16_0.data.574b45e8-bafd-437d-8fbf-deb6e3a1d001
> (No such file or directory)*
>
> Thanks!
>
>


-- 

*Lucas Kacher*Senior Engineer
-
vsco.co <https://www.vsco.co/>
New York, NY
818.512.5239

Reply via email to