Re: parallelize for a large Seq is extreamly slow.

Matei Zaharia Sun, 27 Apr 2014 15:51:27 -0700

How many values are in that sequence? I.e. what is its size?

You can also profile your program while it’s running to see where it’s spending 
time. The easiest way is to get a single stack trace with jstack <process-id>. 
Maybe some of the serialization methods for this data are super inefficient, or 
toSeq on a map is inefficient. You could try word_mapping.value.toArray. I’m 
also wondering if something earlier in the program is slow and this is just not 
obvious from the output.


Matei

On Apr 27, 2014, at 9:47 AM, Earthson <earthson...@gmail.com> wrote:

> That's not work. I don't think it is just slow, It never ends(with 30+ hours,
> and I killed it). 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4900.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: parallelize for a large Seq is extreamly slow.

Reply via email to