Hi, Yes, this is a problem, and I'm not aware of any simple workarounds (or complex one for that matter). There are people working to fix this, you can follow progress here: https://issues.apache.org/jira/browse/SPARK-1239
On Tue, Sep 9, 2014 at 2:54 PM, jbeynon <jbey...@gmail.com> wrote: > I'm running on Yarn with relatively small instances with 4gb memory. I'm not > caching any data but when the map stage ends and shuffling begins all of the > executors request the map output locations at the same time which seems to > kill the driver when the number of executors is turned up. > > For example, the "size of output statuses" is about 10mb and with 500 > executors the driver appears to be making 500 (5gb of data) copies of this > data to send out and running out of memory. When I turn down the number of > executors everything runs fine. > > Has anyone else run into this? Maybe I'm misunderstanding the underlying > cause. I don't have a copy of the stack trace handy but can recreate it if > necessary. It was somewhere in the <init> for HeapByteBuffer. Any advice > would be helpful. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-Driver-OOME-Java-heap-space-when-executors-request-map-output-locations-tp13827.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org