How does your MyClqss looks like? I was experimenting with Row class in python and apparently partitionby automatically takes first column as key. However, I am not sure how you can access a part of an object without deserializing it (either explicitly or Spark doing it for you)....
On Wed, May 6, 2015 at 7:14 PM, Night Wolf <nightwolf...@gmail.com> wrote: > Hi, > > If I have an RDD[MyClass] and I want to partition it by the hash code of > MyClass for performance reasons, is there any way to do this without > converting it into a PairRDD RDD[(K,V)] and calling partitionBy??? > > Mapping it to a tuple2 seems like a waste of space/computation. > > It looks like the PairRDDFunctions..partitionBy() uses a ShuffleRDD[K,V,C] > requires K,V,C? Could I create a new > ShuffleRDD[MyClass,MyClass,MyClass](caseClassRdd, new HashParitioner)? > > Cheers, > N > -- Best Regards, Ayan Guha