collect on partitions get very slow near the last few partitions.

Sung Hwan Chung Fri, 27 Jun 2014 23:37:26 -0700

I'm doing something like this:

rdd.groupBy.map().collect()


The work load on final map is pretty much evenly distributed.

When collect happens, say on 60 partitions, the first 55 or so partitions
finish very quickly say within 10 seconds. However, the last 5,
particularly the very last one, typically get very slow, the overall
collect time reaching 30 seconds to sometimes even 1 minute.

E.g., it would get stuck in a state like 54/55 for a much longer time.

Another interesting thing is the first iteration typically doesn't have
this problem, but it gets progressively worse despite having about the same
workload/partition sizes in subsequent iterations.

This problem worsens with smaller akka framesize and/or maxMbInFlight

Anyone know why this is so?

collect on partitions get very slow near the last few partitions.

Reply via email to