Re: Lost tasks in Spark SQL join jobs

Akhil Das Tue, 22 Sep 2015 02:09:07 -0700

If you look a bit in the error logs, you can possibly see other issues like
GC over head etc, which causes the next set of tasks to fail.


Thanks
Best Regards

On Thu, Sep 17, 2015 at 9:26 AM, Gang Bai <baig...@staff.sina.com.cn> wrote:

> Hi all,
>
> I’m joining two tables on a specific attribute. The job is like
> `sqlContext.sql(“SELECT * FROM tableA LEFT JOIN tableB on
> tableA.uuid=tableB.uuid”)`, where tableA and tableB are two temp tables, of
> which both sizes are around 100 GBs and are not skewed on 'uuid’.
>
> As I run the application, I constantly see logs saying two sets of error:
>
> One is like:
>
> 15/09/17 11:06:50 WARN TaskSetManager: Lost task 2946.0 in stage 1.0 (TID
> 1228, 10.39.2.93): java.io.FileNotFoundException:
> /data2/hadoop/local/usercache/megatron/appcache/application_1435099124107_3613186/blockmgr-4761cb8d-0dbd-4832-98ef-e64a787e09d4/2f/shuffle_1_2946_0.data
> (No such file or directory)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at
> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:130)
>         at
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:201)
>         at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$5$$anonfun$apply$2.apply(ExternalSorter.scala:759)
>         at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$5$$anonfun$apply$2.apply(ExternalSorter.scala:758)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$5.apply(ExternalSorter.scala:758)
>         at
> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$5.apply(ExternalSorter.scala:754)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:754)
>         at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:71)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> and the other is like:
>
> 5/09/17 11:06:50 ERROR YarnScheduler: Lost executor 925 on 10.39.7.87:
> remote Akka client disassociated
> 15/09/17 11:06:50 INFO TaskSetManager: Re-queueing tasks for 925 from
> TaskSet 1.0
> 15/09/17 11:06:50 WARN ReliableDeliverySupervisor: Association with remote
> system [akka.tcp://sparkExecutor@10.39.7.87:52148] has failed, address is
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/09/17 11:06:50 WARN TaskSetManager: Lost task 1321.0 in stage 1.0 (TID
> 1142, 10.39.7.87): ExecutorLostFailure (executor 925 lost)
> 15/09/17 11:06:50 INFO DAGScheduler: Executor lost: 925 (epoch 1659)
> 15/09/17 11:06:50 INFO BlockManagerMasterActor: Trying to remove executor
> 925 from BlockManagerMaster.
> 15/09/17 11:06:50 INFO BlockManagerMasterActor: Removing block manager
> BlockManagerId(925, 10.39.7.87, 51494)
> 15/09/17 11:06:50 INFO BlockManagerMaster: Removed 925 successfully in
> removeExecutor
>
> And increasing the num of executors and executor memory didn’t help. Seems
> this is a very basic use case of SQL. So my question is how to solve this
> issue?
>
> Thanks,
> Gang
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Lost tasks in Spark SQL join jobs

Reply via email to