Re: shuffle.FetchFailedException in spark on YARN job

Akhil Das Sun, 19 Apr 2015 23:57:12 -0700

Which version of Spark are you using? Did you try
using spark.shuffle.blockTransferService=nio


Thanks
Best Regards

On Sat, Apr 18, 2015 at 11:14 PM, roy <rp...@njit.edu> wrote:

> Hi,
>
>  My spark job is failing with following error message
>
> org.apache.spark.shuffle.FetchFailedException:
>
> /mnt/ephemeral12/yarn/nm/usercache/abc/appcache/application_1429353954024_1691/spark-local-20150418132335-0723/28/shuffle_3_1_0.index
> (No such file or directory)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>         at
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:89)
>         at
>
> org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
>         at
>
> org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
>
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:210)
>         at
>
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException:
>
> /mnt/ephemeral12/yarn/nm/usercache/abc/appcache/application_1429353954024_1691/spark-local-20150418132335-0723/28/shuffle_3_1_0.index
> (No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>         at
>
> org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
>         at
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
>         at
>
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:235)
>         at
>
> org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:268)
>         at
>
> org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:115)
>         at
>
> org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:76)
>         at
>
> org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
>         at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>         at
>
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>         ... 7 more
>
> )
>
>
> my job conf was :
> --num-executors 30 --driver-memory 3G --executor-memory 5G --conf
> spark.yarn.executor.memoryOverhead=1000 --conf
> spark.yarn.driver.memoryOverhead=1000 --conf spark.akka.frameSize=100000000
> --executor-cores 3
>
> any idea why its failing on shuffle fetch ?
>
> thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/shuffle-FetchFailedException-in-spark-on-YARN-job-tp22557.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: shuffle.FetchFailedException in spark on YARN job

Reply via email to