Increase partition count (repartition) without shuffle

Ulanov, Alexander Thu, 18 Jun 2015 14:28:25 -0700

Hi,

Is there a way to increase the amount of partition of RDD without causing 
shuffle? I've found JIRA issue https://issues.apache.org/jira/browse/SPARK-5997 
however there is no implementation yet.


Just in case, I am reading data from ~300 big binary files, which results in 
300 partitions, then I need to sort my RDD, but it crashes with outofmemory 
exception. If I change the number of partitions to 2000, sort works OK, but 
repartition itself takes a lot of time due to shuffle.

Best regards, Alexander

Increase partition count (repartition) without shuffle

Reply via email to