Hi Ankur,

When sortByKey() is executed, it performs a range partition of key. After
execution, each partition contains sorted range of elements in your key
which means that different keys may ends up in different partitions.

Liquan

On Thu, Oct 2, 2014 at 11:09 AM, Ankur Srivastava <
[email protected]> wrote:

> Hi All,
>
> I got the past the first problem where now I am able to create a partition
> with keys only having same sub-strings in one partition. I was able to get
> that by adjusting the worker thread numbers to greater than 1 as I am
> running the application from eclipse on localhost.
>
> But the issue with sorting still remains.
>
> So after I have partitioned the RDD, I invoke partitionedRdd.sortByKey(),
> but now each partition only has pairs which have same key.
>
> one thing I wanted to mention that I am using CassandraJavaRDD for this.
>
> Thanks
> - Ankur
>
> On Wed, Oct 1, 2014 at 10:12 PM, Ankur Srivastava <
> [email protected]> wrote:
>
>> Hi,
>>
>> I am using custom partitioner to partition my JavaPairRDD where key is a
>> String.
>>
>> I use hashCode of the sub-string of the key to derive the partition index
>> but I have noticed that my partition contains keys which have a different
>> partitionIndex returned by the partitioner.
>>
>> Another issue I am facing is that when I sort the rdd further after
>> partitioning, my partition has only keys which are equal.
>>
>> My Partitioner is as below:
>>
>> public class BlockPartitioner extends Partitioner {
>>
>> private int numPartitions = 8;
>>
>> @Override
>>
>> public int numPartitions() {
>>
>> return numPartitions;
>>
>> }
>>
>>
>> @Override
>>
>> public int getPartition(Object key) {
>>
>> String dept = key.subString(0,7);
>>
>> int partitionId = dept.hashCode();
>>
>> return partitionId % numPartitions;
>>
>>  }
>>
>> }
>>
>> I am using "foreachPartition" of the java pair rddd to verify my
>> partitions.
>>
>> Thanks
>> Ankur
>>
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Reply via email to