Kendal, have you tried to reduce number of partitions?
-- Be well! Jean Morozov On Mon, Dec 28, 2015 at 9:02 AM, kendal <ken...@163.com> wrote: > My driver is running OOM with my 4T data set... I don't collect any data to > driver. All what the program done is map - reduce - saveAsTextFile. But the > partitions to be shuffled is quite large - 20K+. > > The symptom what I'm seeing the timeout when GetMapOutputStatuses from > Driver. > 15/12/24 02:04:21 INFO spark.MapOutputTrackerWorker: Don't have map outputs > for shuffle 0, fetching them > 15/12/24 02:04:21 INFO spark.MapOutputTrackerWorker: Doing the fetch; > tracker endpoint = > AkkaRpcEndpointRef(Actor[akka.tcp:// > sparkDriver@10.115.58.55:52077/user/MapOutputTracker#-1937024516]) > 15/12/24 02:06:21 WARN akka.AkkaRpcEndpointRef: Error sending message > [message = GetMapOutputStatuses(0)] in 1 attempts > org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 > seconds]. This timeout is controlled by spark.rpc.askTimeout > at > org.apache.spark.rpc.RpcTimeout.org > $apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214) > at > > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229) > at > > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225) > at > > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242) > > But the root cause is OOM: > 15/12/24 02:05:36 ERROR actor.ActorSystemImpl: Uncaught fatal error from > thread [sparkDriver-akka.remote.default-remote-dispatcher-24] shutting down > ActorSystem [sparkDriver] > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2271) > at > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178) > at akka.serialization.JavaSerializer.toBinary(Serializer.scala:131) > at > akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36) > at > > akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:843) > at > > akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:843) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) > at akka.remote.EndpointWriter.serializeMessage(Endpoint.scala:842) > at akka.remote.EndpointWriter.writeSend(Endpoint.scala:743) > at > akka.remote.EndpointWriter$$anonfun$4.applyOrElse(Endpoint.scala:718) > at akka.actor.Actor$class.aroundReceive(Actor.scala:467) > at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:411) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) > at akka.dispatch.Mailbox.run(Mailbox.scala:220) > > I've already allocated 16G memory for my driver - which is the hard limit > MAX of my Yarn cluster. And I also applied Kryo serialization... Any idea > to > reduce memory foot point? > And what confuses me is that, even I have 20K+ partition to shuffle, why I > need so much memory?! > > Thank you so much for any help! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Help-Driver-OOM-when-shuffle-large-amount-of-data-tp25818.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >