Re: spark.akka.frameSize stalls job in 1.1.0

jerryye Mon, 18 Aug 2014 09:55:49 -0700

I've been trying different approaches of this: populating the trie on the 
driver and serializing the instance to executors, broadcasting the strings in 
an array and populating the trie on the executors, and variants of what I'm 
broadcasting or serializing. All approaches seem to have a memory issue.


Xiangrui has been able to run this snippet on his cluster without problems and 
we're trying to identify the difference.

- jerry


> On Aug 18, 2014, at 9:21 AM, "zhazhan [via Apache Spark Developers List]" 
> <[email protected]> wrote:
> 
> Not sure exactly how you use it. My understanding is that in spark it would 
> be better to keep the overhead of driver as less as possible. Is it possible 
> to broadcast trie to executors, do computation there and then aggregate the 
> counters (??) in reduct phase? 
> 
> Thanks. 
> 
> Zhan Zhang 
> 
> On Aug 18, 2014, at 8:54 AM, Jerry Ye <[hidden email]> wrote: 
> 
> > Hi Zhan, 
> > Thanks for looking into this. I'm actually using the hash map as an example 
> > of the simplest snippet of code that is failing for me. I know that this is 
> > just the word count. In my actual problem I'm using a Trie data structure 
> > to find substring matches. 
> > 
> > 
> > On Sun, Aug 17, 2014 at 11:35 PM, Zhan Zhang <[hidden email]> wrote: 
> > Is it because countByValue or toArray put too much stress on the driver, if 
> > there are many unique words 
> > To me it is a typical word count problem, then you can solve it as follows 
> > (correct me if I am wrong) 
> > 
> > val textFile = sc.textFile(“file") 
> > val counts = textFile.flatMap(line => line.split(" ")).map(word => (word, 
> > 1)).reduceByKey((a, b) => a + b) 
> > counts.saveAsTextFile(“file”)//any way you don’t want to collect results to 
> > master, and instead putting them in file. 
> > 
> > Thanks. 
> > 
> > Zhan Zhang 
> > 
> > On Aug 16, 2014, at 9:18 AM, Jerry Ye <[hidden email]> wrote: 
> > 
> > > The job ended up running overnight with no progress. :-( 
> > > 
> > > 
> > > On Sat, Aug 16, 2014 at 12:16 AM, Jerry Ye <[hidden email]> wrote: 
> > > 
> > >> Hi Xiangrui, 
> > >> I actually tried branch-1.1 and master and it resulted in the job being 
> > >> stuck at the TaskSetManager: 
> > >> 14/08/16 06:55:48 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 
> > >> with 2 tasks 
> > >> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:0 as 
> > >> TID 2 on executor 8: ip-10-226-199-225.us-west-2.compute.internal 
> > >> (PROCESS_LOCAL) 
> > >> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:0 
> > >> as 
> > >> 28055875 bytes in 162 ms 
> > >> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Starting task 1.0:1 as 
> > >> TID 3 on executor 0: ip-10-249-53-62.us-west-2.compute.internal 
> > >> (PROCESS_LOCAL) 
> > >> 14/08/16 06:55:48 INFO scheduler.TaskSetManager: Serialized task 1.0:1 
> > >> as 
> > >> 28055875 bytes in 178 ms 
> > >> 
> > >> It's been 10 minutes with no progress on relatively small data. I'll let 
> > >> it run overnight and update in the morning. Is there some place that I 
> > >> should look to see what is happening? I tried to ssh into the executor 
> > >> and 
> > >> look at /root/spark/logs but there wasn't anything informative there. 
> > >> 
> > >> I'm sure using CountByValue works fine but my use of a HashMap is only 
> > >> an 
> > >> example. In my actual task, I'm loading a Trie data structure to perform 
> > >> efficient string matching between a dataset of locations and strings 
> > >> possibly containing mentions of locations. 
> > >> 
> > >> This seems like a common thing, to process input with a relatively 
> > >> memory 
> > >> intensive object like a Trie. I hope I'm not missing something obvious. 
> > >> Do 
> > >> you know of any example code like my use case? 
> > >> 
> > >> Thanks! 
> > >> 
> > >> - jerry 
> > >> 
> > 
> > 
> > -- 
> > CONFIDENTIALITY NOTICE 
> > NOTICE: This message is intended for the use of the individual or entity to 
> > which it is addressed and may contain information that is confidential, 
> > privileged and exempt from disclosure under applicable law. If the reader 
> > of this message is not the intended recipient, you are hereby notified that 
> > any printing, copying, dissemination, distribution, disclosure or 
> > forwarding of this communication is strictly prohibited. If you have 
> > received this communication in error, please contact the sender immediately 
> > and delete it from your system. Thank You. 
> >
> 
> 
> -- 
> CONFIDENTIALITY NOTICE 
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You. 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7901.html
> To start a new topic under Apache Spark Developers List, email 
> [email protected] 
> To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here.
> NAML




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7904.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: spark.akka.frameSize stalls job in 1.1.0

Reply via email to