Re: coalescing RDD into equally sized partitions

2014-03-26 Thread Walrus theCat
For the record, I tried this, and it worked. On Wed, Mar 26, 2014 at 10:51 AM, Walrus theCat wrote: > Oh so if I had something more reasonable, like RDD's full of tuples of > say, (Int,Set,Set), I could expect a more uniform distribution? > > Thanks > > > On Mon, Mar 24, 2014 at 11:11 PM, Ma

Re: coalescing RDD into equally sized partitions

2014-03-26 Thread Walrus theCat
Oh so if I had something more reasonable, like RDD's full of tuples of say, (Int,Set,Set), I could expect a more uniform distribution? Thanks On Mon, Mar 24, 2014 at 11:11 PM, Matei Zaharia wrote: > This happened because they were integers equal to 0 mod 5, and we used the > default hashCod

Re: coalescing RDD into equally sized partitions

2014-03-24 Thread Matei Zaharia
This happened because they were integers equal to 0 mod 5, and we used the default hashCode implementation for integers, which will map them all to 0. There’s no API method that will look at the resulting partition sizes and rebalance them, but you could use another hash function. Matei On Mar