Hi Xinh,
Thanks! Custom partitioner with partitionBy() did the job.
On Tue, May 10, 2016 at 11:36 PM, Xinh Huynh wrote:
> Hi Ayman,
>
> Have you looked at this:
> http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
>
> It
Well, for Python, it should be rdd.coalesce(10, shuffle=True)
I have had good success with this using the Scala API in Spark 1.6.1.
-Don
On Tue, May 10, 2016 at 3:15 PM, Ayman Khalil wrote:
> And btw, I'm using the Python API if this makes any difference.
>
> On Tue, May 10, 2016 at 11:14 PM,
Hi Ayman,
Have you looked at this:
http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
It recommends defining a custom partitioner and (PairRDD) partitionBy
method to accomplish this.
Xinh
On Tue, May 10, 2016 at 1:15 PM,
And btw, I'm using the Python API if this makes any difference.
On Tue, May 10, 2016 at 11:14 PM, Ayman Khalil
wrote:
> Hi Don,
>
> This didn't help. My original rdd is already created using 10 partitions.
> As a matter of fact, after trying with rdd.coalesce(10, shuffle =
> true) out of curiosi
Hi Don,
This didn't help. My original rdd is already created using 10 partitions.
As a matter of fact, after trying with rdd.coalesce(10, shuffle = true) out
of curiosity, the rdd partitions became even more imbalanced:
[(0, 5120), (1, 5120), (2, 5120), (3, 5120), (4, *3920*), (5, 4096), (6,
5120)
You can call rdd.coalesce(10, shuffle = true) and the returning rdd will be
evenly balanced. This obviously triggers a shuffle, so be advised it could
be an expensive operation depending on your RDD size.
-Don
On Tue, May 10, 2016 at 12:38 PM, Ayman Khalil
wrote:
> Hello,
>
> I have 50,000 ite
Hello,
I have 50,000 items parallelized into an RDD with 10 partitions, I would
like to evenly split the items over the partitions so:
50,000/10 = 5,000 in each RDD partition.
What I get instead is the following (partition index, partition count):
[(0, 4096), (1, 5120), (2, 5120), (3, 5120), (4,