Re: spark.akka.frameSize stalls job in 1.1.0

Jerry Ye Sun, 17 Aug 2014 21:46:38 -0700

The job ended up running overnight with no progress. :-(


On Sat, Aug 16, 2014 at 12:16 AM, Jerry Ye <jerr...@gmail.com> wrote:

> Hi Xiangrui,
> I actually tried branch-1.1 and master and it resulted in the job being
> stuck at the TaskSetManager:
> 14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
> with 2 tasks
> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:0 as
> TID 2 on executor 8: ip-10-226-199-225.us-west-2.compute.internal
> (PROCESS_LOCAL)
> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as
> 28055875 bytes in 162 ms
> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:1 as
> TID 3 on executor 0: ip-10-249-53-62.us-west-2.compute.internal
> (PROCESS_LOCAL)
> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:1 as
> 28055875 bytes in 178 ms
>
> It's been 10 minutes with no progress on relatively small data. I'll let
> it run overnight and update in the morning. Is there some place that I
> should look to see what is happening? I tried to ssh into the executor and
> look at /root/spark/logs but there wasn't anything informative there.
>
> I'm sure using CountByValue works fine but my use of a HashMap is only an
> example. In my actual task, I'm loading a Trie data structure to perform
> efficient string matching between a dataset of locations and strings
> possibly containing mentions of locations.
>
> This seems like a common thing, to process input with a relatively memory
> intensive object like a Trie. I hope I'm not missing something obvious. Do
> you know of any example code like my use case?
>
> Thanks!
>
> - jerry
>
>
>
>
> On Fri, Aug 15, 2014 at 10:02 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
>> Just saw you used toArray on an RDD. That copies all data to the
>> driver and it is deprecated. countByValue is what you need:
>>
>> val samples = sc.textFile("s3n://geonames")
>> val counts = samples.countByValue()
>> val result = samples.map(l => (l, counts.getOrElse(l, 0L))
>>
>> Could you also try to use the latest branch-1.1 or master with the
>> default akka.frameSize setting? The serialized task size should be
>> small because we now use broadcast RDD objects.
>>
>> -Xiangrui
>>
>> On Fri, Aug 15, 2014 at 5:11 PM, jerryye <jerr...@gmail.com> wrote:
>> > Hi Xiangrui,
>> > You were right, I had to use --driver_memory instead of setting it in
>> > spark-defaults.conf.
>> >
>> > However, now my just hangs with the following message:
>> > 4/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:0 as
>> > 29433434 bytes in 202 ms
>> > 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Starting task 1.0:1 as
>> TID
>> > 3 on executor 1: ip-10-226-198-31.us-west-2.compute.internal
>> (PROCESS_LOCAL)
>> > 14/08/15 23:54:46 INFO scheduler.TaskSetManager: Serialized task 1.0:1
>> as
>> > 29433434 bytes in 203 ms
>> >
>> > Any ideas on where else to look?
>> >
>> >
>> > On Fri, Aug 15, 2014 at 3:29 PM, Xiangrui Meng [via Apache Spark
>> Developers
>> > List] <ml-node+s1001551n7883...@n3.nabble.com> wrote:
>> >
>> >> Did you verify the driver memory in the Executor tab of the WebUI? I
>> >> think you need `--driver-memory 8g` with spark-shell or spark-submit
>> >> instead of setting it in spark-defaults.conf.
>> >>
>> >> On Fri, Aug 15, 2014 at 12:41 PM, jerryye <[hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=7883&i=0>> wrote:
>> >>
>> >> > Setting spark.driver.memory has no effect. It's still hanging trying
>> to
>> >> > compute result.count when I'm sampling greater than 35% regardless of
>> >> what
>> >> > value of spark.driver.memory I'm setting.
>> >> >
>> >> > Here's my settings:
>> >> > export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g"
>> >> > export SPARK_MEM=10g
>> >> >
>> >> > in conf/spark-defaults:
>> >> > spark.driver.memory 1500
>> >> > spark.serializer org.apache.spark.serializer.KryoSerializer
>> >> > spark.kryoserializer.buffer.mb 500
>> >> > spark.executor.memory 58315m
>> >> > spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/
>> >> > spark.executor.extraClassPath /root/ephemeral-hdfs/conf
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7877.html
>> >>
>> >> > Sent from the Apache Spark Developers List mailing list archive at
>> >> Nabble.com.
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=7883&i=1>
>> >> > For additional commands, e-mail: [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=7883&i=2>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=7883&i=3>
>> >> For additional commands, e-mail: [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=7883&i=4>
>> >>
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> discussion
>> >> below:
>> >>
>> >>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7883.html
>> >>  To start a new topic under Apache Spark Developers List, email
>> >> ml-node+s1001551n1...@n3.nabble.com
>> >> To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click
>> here
>> >> <
>> http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7865&code=amVycnl5ZUBnbWFpbC5jb218Nzg2NXwtNTI4OTc1MTAz
>> >
>> >> .
>> >> NAML
>> >> <
>> http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
>> >
>> >>
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7886.html
>> > Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>
>

Re: spark.akka.frameSize stalls job in 1.1.0

Reply via email to