I've experienced such kind of outputs when executor was killed(e.g. by OOM killer) or was lost for some reason i.e. try to look at machine if executor wasn't restarted...
On 26 February 2016 at 08:37, Takeshi Yamamuro <linguin....@gmail.com> wrote: > Hi, > > Could you make simple codes to reproduce the issue? > I'm not exactly sure why shuffle data on temp dir. are wrongly deleted. > > thanks, > > > > On Fri, Feb 26, 2016 at 6:00 AM, Zee Chen <zeo...@gmail.com> wrote: > >> Hi, >> >> I am debugging a situation where SortShuffleWriter sometimes fail to >> create a file, with the following stack trace: >> >> 16/02/23 11:48:46 ERROR Executor: Exception in task 13.0 in stage >> 47827.0 (TID 1367089) >> java.io.FileNotFoundException: >> >> /tmp/spark-9dd8dca9-6803-4c6c-bb6a-0e9c0111837c/executor-129dfdb8-9422-4668-989e-e789703526ad/blockmgr-dda6e340-7859-468f-b493-04e4162d341a/00/temp_shuffle_69fe1673-9ff2-462b-92b8-683d04669aad >> (No such file or directory) >> at java.io.FileOutputStream.open0(Native Method) >> at java.io.FileOutputStream.open(FileOutputStream.java:270) >> at java.io.FileOutputStream.<init>(FileOutputStream.java:213) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) >> at >> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110) >> at >> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> >> I checked the linux file system (ext4) and saw the /00/ subdir is >> missing. I went through the heap dump of the >> CoarseGrainedExecutorBackend jvm proc and found that >> DiskBlockManager's subDirs list had more non-null 2-hex subdirs than >> present on the file system! As a test I created all 64 2-hex subdirs >> by hand and then the problem went away. >> >> So had anybody else seen this problem? Looking at the relevant logic >> in DiskBlockManager and it hasn't changed much since the fix to >> https://issues.apache.org/jira/browse/SPARK-6468 >> >> My configuration: >> spark-1.5.1, hadoop-2.6.0, standalone, oracle jdk8u60 >> >> Thanks, >> Zee >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > --- > Takeshi Yamamuro >