How many values are in that sequence? I.e. what is its size? You can also profile your program while it’s running to see where it’s spending time. The easiest way is to get a single stack trace with jstack <process-id>. Maybe some of the serialization methods for this data are super inefficient, or toSeq on a map is inefficient. You could try word_mapping.value.toArray. I’m also wondering if something earlier in the program is slow and this is just not obvious from the output.
Matei On Apr 27, 2014, at 9:47 AM, Earthson <earthson...@gmail.com> wrote: > That's not work. I don't think it is just slow, It never ends(with 30+ hours, > and I killed it). > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4900.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.