For the record, I tried this, and it worked.
On Wed, Mar 26, 2014 at 10:51 AM, Walrus theCat wrote:
> Oh so if I had something more reasonable, like RDD's full of tuples of
> say, (Int,Set,Set), I could expect a more uniform distribution?
>
> Thanks
>
>
> On Mon, Mar 24, 2014 at 11:11 PM, Ma
Oh so if I had something more reasonable, like RDD's full of tuples of
say, (Int,Set,Set), I could expect a more uniform distribution?
Thanks
On Mon, Mar 24, 2014 at 11:11 PM, Matei Zaharia wrote:
> This happened because they were integers equal to 0 mod 5, and we used the
> default hashCod
This happened because they were integers equal to 0 mod 5, and we used the
default hashCode implementation for integers, which will map them all to 0.
There’s no API method that will look at the resulting partition sizes and
rebalance them, but you could use another hash function.
Matei
On Mar