Hello All I am facing FileNotFoundException for shuffle index file when running job with large data. Same job runs fine with smaller datasets. These our my cluster specifications -
No of nodes - 19 Total cores - 380 Memory per executor - 32G Spark 1.6 mapr version spark.shuffle.service.enabled - false I am running job with 28G memory, 50 executors and 1 core per executor. Job is failing at stage having dataframe explode where each row gets multiplied to 6 rows. Here are exception details- Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /tmp/hadoop-mapr/nm-local-dir/usercache/sshinde/appcache/application_1480622725467_0071/blockmgr-3b2051f5-81c8-40a5-a332-9d32b4586a5d/38/shuffle_14_229_0.index (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:191) at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:291) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58) at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) I tried with below configurations but nothing worked out- conf.set("spark.io.compression.codec", "lz4") conf.set("spark.network.timeout", "1000s") conf.set("spark.sql.shuffle.partitions", "2500") spark.yarn.executor.memoryOverhead should be high due to 32g of executor memory. (10% of 32g) Increased number of partitions till 15000 I checked yarn logs briefly and nothing stand out apart from above exception. Please let me if there is something I am missing or alternatives to make large data jobs run. Thank you.. Thanks Swapnil