Quizhang,

This is a known issue that ExternalAppendOnlyMap can do tons of tiny spills
in certain situations. SPARK-4452 aims to deal with this issue, but we
haven't finalized a solution yet.

Dinesh's solution should help as a workaround, but you'll likely experience
suboptimal performance when trying to merge tons of small files from disk.

-Sandy

On Wed, Nov 19, 2014 at 10:10 PM, Dinesh J. Weerakkody <
dineshjweerakk...@gmail.com> wrote:

> Hi Qiuzhuang,
>
> This is a linux related issue. Please go through this [1] and change the
> limits. hope this will solve your problem.
>
> [1] https://rtcamp.com/tutorials/linux/increase-open-files-limit/
>
> On Thu, Nov 20, 2014 at 9:45 AM, Qiuzhuang Lian <qiuzhuang.l...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > While doing some ETL, I  run into error of 'Too many open files' as
> > following logs,
> >
> > Thanks,
> > Qiuzhuang
> >
> > 4/11/20 20:12:02 INFO collection.ExternalAppendOnlyMap: Thread 63
> spilling
> > in-memory map of 100.8 KB to disk (953 times so far)
> > 14/11/20 20:12:02 ERROR storage.DiskBlockObjectWriter: Uncaught exception
> > while reverting partial writes to file
> >
> >
> /tmp/spark-local-20141120200455-4137/2f/temp_local_f83cbf2f-60a4-4fbd-b5d2-32a0c569311b
> > java.io.FileNotFoundException:
> >
> >
> /tmp/spark-local-20141120200455-4137/2f/temp_local_f83cbf2f-60a4-4fbd-b5d2-32a0c569311b
> > (Too many open files)
> >         at java.io.FileOutputStream.open(Native Method)
> >         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
> >         at
> >
> >
> org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(BlockObjectWriter.scala:178)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:203)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:63)
> >         at
> >
> >
> org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:77)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:63)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:131)
> >         at
> >
> >
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160)
> >         at
> >
> >
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
> >         at
> >
> >
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> >         at
> >
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >         at
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >         at
> >
> >
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> >         at
> > org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> >         at
> > org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> >         at
> >
> >
> org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> >         at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> >         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> >         at
> > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
> >         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> >         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> >         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> >         at
> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> >         at org.apache.spark.scheduler.Task.run(Task.scala:56)
> >         at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:744)
> > 14/11/20 20:12:02 ERROR executor.Executor: Exception in task 0.0 in stage
> > 36.0 (TID 20)
> > java.io.FileNotFoundException:
> >
> >
> /tmp/spark-local-20141120200455-4137/2f/temp_local_f83cbf2f-60a4-4fbd-b5d2-32a0c569311b
> > (Too many open files)
> >         at java.io.FileOutputStream.open(Native Method)
> >         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
> >         at
> >
> >
> org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123)
> >         at
> >
> >
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:180)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:63)
> >         at
> >
> >
> org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:77)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:63)
> >         at
> >
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:131)
> >
>
>
>
> --
> Thanks & Best Regards,
>
> *Dinesh J. Weerakkody*
>

Reply via email to