Re: Help: Driver OOM when shuffle large amount of data

Eugene Morozov Mon, 28 Dec 2015 02:25:11 -0800

Kendal,

have you tried to reduce number of partitions?


--
Be well!
Jean Morozov

On Mon, Dec 28, 2015 at 9:02 AM, kendal <ken...@163.com> wrote:

> My driver is running OOM with my 4T data set... I don't collect any data to
> driver. All what the program done is map - reduce - saveAsTextFile. But the
> partitions to be shuffled is quite large - 20K+.
>
> The symptom what I'm seeing the timeout when GetMapOutputStatuses from
> Driver.
> 15/12/24 02:04:21 INFO spark.MapOutputTrackerWorker: Don't have map outputs
> for shuffle 0, fetching them
> 15/12/24 02:04:21 INFO spark.MapOutputTrackerWorker: Doing the fetch;
> tracker endpoint =
> AkkaRpcEndpointRef(Actor[akka.tcp://
> sparkDriver@10.115.58.55:52077/user/MapOutputTracker#-1937024516])
> 15/12/24 02:06:21 WARN akka.AkkaRpcEndpointRef: Error sending message
> [message = GetMapOutputStatuses(0)] in 1 attempts
> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
> seconds]. This timeout is controlled by spark.rpc.askTimeout
>         at
> org.apache.spark.rpc.RpcTimeout.org
> $apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
>         at
>
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
>         at
>
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
>         at
>
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242)
>
> But the root cause is OOM:
> 15/12/24 02:05:36 ERROR actor.ActorSystemImpl: Uncaught fatal error from
> thread [sparkDriver-akka.remote.default-remote-dispatcher-24] shutting down
> ActorSystem [sparkDriver]
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2271)
>         at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
>         at akka.serialization.JavaSerializer.toBinary(Serializer.scala:131)
>         at
> akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36)
>         at
>
> akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:843)
>         at
>
> akka.remote.EndpointWriter$$anonfun$serializeMessage$1.apply(Endpoint.scala:843)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>         at akka.remote.EndpointWriter.serializeMessage(Endpoint.scala:842)
>         at akka.remote.EndpointWriter.writeSend(Endpoint.scala:743)
>         at
> akka.remote.EndpointWriter$$anonfun$4.applyOrElse(Endpoint.scala:718)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
>         at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:411)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>
> I've already allocated 16G memory for my driver - which is the hard limit
> MAX of my Yarn cluster. And I also applied Kryo serialization... Any idea
> to
> reduce memory foot point?
> And what confuses me is that, even I have 20K+ partition to shuffle, why I
> need so much memory?!
>
> Thank you so much for any help!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Help-Driver-OOM-when-shuffle-large-amount-of-data-tp25818.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Help: Driver OOM when shuffle large amount of data

Reply via email to