Currently, I am using 1.6.1 version. I continue to use it as my current
code is heavily reliant on RDD's and not dataframes. Also, because 1.6.1 is
stabler than newer versions.


The input data is user behavior data of 20 fields and 1 billion records (~
1.5 TB) . I am trying to group by user id and calculate some users
statistics. But, I guess the number of mapper tasks are too high resulting
in akka.frame.size error.

1) Does akka.frame.size has to be proportionately increased with size of
data which indirectly affects the number of partitions?
2) Or do the  huge number of mappers in the code (It may not be prevented)
result in the frame size error?

On Sun, Jan 29, 2017 at 11:15 PM, Jörn Franke [via Apache Spark Developers
List] <ml-node+s1001551n20796...@n3.nabble.com> wrote:

> Which Spark version are you using? What are you trying to do exactly and
> what is the input data? As far as I know, akka has been dropped in recent
> Spark versions.
>
> > On 30 Jan 2017, at 00:44, aravasai <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=20796&i=0>> wrote:
> >
> > I have a spark job running on 2 terabytes of data which creates more
> than
> > 30,000 partitions. As a result, the spark job fails with the error
> > "Map output statuses were 170415722 bytes which exceeds
> spark.akka.frameSize
> > 52428800 bytes" (For 1 TB data)
> > However, when I increase the akka.frame.size to around 500 MB, the job
> hangs
> > with no further progress.
> >
> > So, what is the ideal or maximum limit that i can assign akka.frame.size
> so
> > that I do not get the error of map output statuses exceeding limit for
> large
> > chunks of data ?
> >
> > Is coalescing the data into smaller number of partitions the only
> solution
> > to this problem? Is there any better way than coalescing many
> intermediate
> > rdd's in program ?
> >
> > My driver memory: 10G
> > Executor memory: 36G
> > Executor memory overhead : 3G
> >
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Maximum-limit-for-
> akka-frame-size-be-greater-than-500-MB-tp20793.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=20796&i=1>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=20796&i=2>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-developers-list.1001551.n3.
> nabble.com/Maximum-limit-for-akka-frame-size-be-greater-
> than-500-MB-tp20793p20796.html
> To start a new topic under Apache Spark Developers List, email
> ml-node+s1001551n1...@n3.nabble.com
> To unsubscribe from Maximum limit for akka.frame.size be greater than 500
> MB ?, click here
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=20793&code=YXJhdmFzYWlAZ21haWwuY29tfDIwNzkzfDEzNDU1NjkyNTk=>
> .
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Maximum-limit-for-akka-frame-size-be-greater-than-500-MB-tp20793p20797.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to