Re: Bug in DiskBlockManager subDirs logic?

Takeshi Yamamuro Thu, 25 Feb 2016 22:38:18 -0800

Hi,

Could you make simple codes to reproduce the issue?
I'm not exactly sure why shuffle data on temp dir. are wrongly deleted.


thanks,



On Fri, Feb 26, 2016 at 6:00 AM, Zee Chen <zeo...@gmail.com> wrote:

> Hi,
>
> I am debugging a situation where SortShuffleWriter sometimes fail to
> create a file, with the following stack trace:
>
> 16/02/23 11:48:46 ERROR Executor: Exception in task 13.0 in stage
> 47827.0 (TID 1367089)
> java.io.FileNotFoundException:
>
> /tmp/spark-9dd8dca9-6803-4c6c-bb6a-0e9c0111837c/executor-129dfdb8-9422-4668-989e-e789703526ad/blockmgr-dda6e340-7859-468f-b493-04e4162d341a/00/temp_shuffle_69fe1673-9ff2-462b-92b8-683d04669aad
> (No such file or directory)
>         at java.io.FileOutputStream.open0(Native Method)
>         at java.io.FileOutputStream.open(FileOutputStream.java:270)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>         at
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
>         at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110)
>         at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> I checked the linux file system (ext4) and saw the /00/ subdir is
> missing. I went through the heap dump of the
> CoarseGrainedExecutorBackend jvm proc and found that
> DiskBlockManager's subDirs list had more non-null 2-hex subdirs than
> present on the file system! As a test I created all 64 2-hex subdirs
> by hand and then the problem went away.
>
> So had anybody else seen this problem? Looking at the relevant logic
> in DiskBlockManager and it hasn't changed much since the fix to
> https://issues.apache.org/jira/browse/SPARK-6468
>
> My configuration:
> spark-1.5.1, hadoop-2.6.0, standalone, oracle jdk8u60
>
> Thanks,
> Zee
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro

Re: Bug in DiskBlockManager subDirs logic?

Reply via email to