Hi,
I am using custom partitioner to partition my JavaPairRDD where key is a
String.
I use hashCode of the sub-string of the key to derive the partition index
but I have noticed that my partition contains keys which have a different
partitionIndex returned by the partitioner.
Another issue I am facing is that when I sort the rdd further after
partitioning, my partition has only keys which are equal.
My Partitioner is as below:
public class BlockPartitioner extends Partitioner {
private int numPartitions = 8;
@Override
public int numPartitions() {
return numPartitions;
}
@Override
public int getPartition(Object key) {
String dept = key.subString(0,7);
int partitionId = dept.hashCode();
return partitionId % numPartitions;
}
}
I am using "foreachPartition" of the java pair rddd to verify my partitions.
Thanks
Ankur