Andrew, thanks for the suggestion, but unfortunately it didn't work -- still getting the same exception. On Mon, Mar 21, 2016 at 10:32 AM Andrew Or <and...@databricks.com> wrote:
> @Nezih, can you try again after setting `spark.memory.useLegacyMode` to > true? Can you still reproduce the OOM that way? > > 2016-03-21 10:29 GMT-07:00 Nezih Yigitbasi <nyigitb...@netflix.com.invalid > >: > >> Hi Spark devs, >> I am using 1.6.0 with dynamic allocation on yarn. I am trying to run a >> relatively big application with 10s of jobs and 100K+ tasks and my app >> fails with the exception below. The closest jira issue I could find is >> SPARK-11293 <https://issues.apache.org/jira/browse/SPARK-11293>, which >> is a critical bug that has been open for a long time. There are other >> similar jira issues (all fixed): SPARK-10474 >> <https://issues.apache.org/jira/browse/SPARK-10474>, SPARK-10733 >> <https://issues.apache.org/jira/browse/SPARK-10733>, SPARK-10309 >> <https://issues.apache.org/jira/browse/SPARK-10309>, SPARK-10379 >> <https://issues.apache.org/jira/browse/SPARK-10379>. >> >> Any workarounds to this issue or any plans to fix it? >> >> Thanks a lot, >> Nezih >> >> 16/03/19 05:12:09 INFO memory.TaskMemoryManager: Memory used in task >> 4687016/03/19 05:12:09 INFO memory.TaskMemoryManager: Acquired by >> org.apache.spark.shuffle.sort.ShuffleExternalSorter@1c36f801: 32.0 >> KB16/03/19 05:12:09 INFO memory.TaskMemoryManager: 1512915599 bytes of >> memory were used by task 46870 but are not associated with specific >> consumers16/03/19 05:12:09 INFO memory.TaskMemoryManager: 1512948367 bytes >> of memory are used for execution and 156978343 bytes of memory are used for >> storage16/03/19 05:12:09 ERROR executor.Executor: Managed memory leak >> detected; size = 1512915599 bytes, TID = 4687016/03/19 05:12:09 ERROR >> executor.Executor: Exception in task 77.0 in stage 273.0 (TID 46870) >> java.lang.OutOfMemoryError: Unable to acquire 128 bytes of memory, got 0 >> at >> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120) >> at >> org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:354) >> at >> org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:375) >> at >> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) >> at >> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745)16/03/19 05:12:09 ERROR >> util.SparkUncaughtExceptionHandler: Uncaught exception in thread >> Thread[Executor task launch worker-8,5,main] >> java.lang.OutOfMemoryError: Unable to acquire 128 bytes of memory, got 0 >> at >> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120) >> at >> org.apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.java:354) >> at >> org.apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.java:375) >> at >> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.java:237) >> at >> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745)16/03/19 05:12:10 INFO >> storage.DiskBlockManager: Shutdown hook called16/03/19 05:12:10 INFO >> util.ShutdownHookManager: Shutdown hook called >> >> >> > >