Re: Custom Partitioner

2015-09-02 Thread Jem Tucker
alter the range partitioner such that it skews the partitioning and assigns more partitions to the heavier weighted keys? to do this you will have to know the weighting before you start On Wed, Sep 2, 2015 at 8:02 AM shahid ashraf wrote: > yes i can take as an example , but my actual use case is

Re: Custom Partitioner

2015-09-02 Thread shahid ashraf
yes i can take as an example , but my actual use case is that in need to resolve a data skew, when i do grouping based on key(A-Z) the resulting partitions are skewed like (partition no.,no_of_keys, total elements with given key) << partition: [(0, 0, 0), (1, 15, 17395), (2, 0, 0), (3, 0, 0), (4, 1

Re: Custom Partitioner

2015-09-01 Thread Davies Liu
You can take the sortByKey as example: https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L642 On Tue, Sep 1, 2015 at 3:48 AM, Jem Tucker wrote: > something like... > > class RangePartitioner(Partitioner): > def __init__(self, numParts): > self.numPartitions = numParts > self.parti

Re: Custom Partitioner

2015-09-01 Thread Jem Tucker
something like... class RangePartitioner(Partitioner): def __init__(self, numParts): self.numPartitions = numParts self.partitionFunction = rangePartition def rangePartition(key): # Logic to turn key into a partition id return id On Tue, Sep 1, 2015 at 11:38 AM shahid ashraf wrote: > Hi > > I t

Re: Custom Partitioner

2015-09-01 Thread shahid ashraf
Hi I think range partitioner is not available in pyspark, so if we want create one. how should we create that. my question is that. On Tue, Sep 1, 2015 at 3:57 PM, Jem Tucker wrote: > Ah sorry I miss read your question. In pyspark it looks like you just need > to instantiate the Partitioner cla

Re: Custom Partitioner

2015-09-01 Thread Jem Tucker
Ah sorry I miss read your question. In pyspark it looks like you just need to instantiate the Partitioner class with numPartitions and partitionFunc. On Tue, Sep 1, 2015 at 11:13 AM shahid ashraf wrote: > Hi > > I did not get this, e.g if i need to create a custom partitioner like > range partit

Re: Custom Partitioner

2015-09-01 Thread shahid ashraf
Hi I did not get this, e.g if i need to create a custom partitioner like range partitioner. On Tue, Sep 1, 2015 at 3:22 PM, Jem Tucker wrote: > Hi, > > You just need to extend Partitioner and override the numPartitions and > getPartition methods, see below > > class MyPartitioner extends partit

Re: Custom Partitioner

2015-09-01 Thread Jem Tucker
Hi, You just need to extend Partitioner and override the numPartitions and getPartition methods, see below class MyPartitioner extends partitioner { def numPartitions: Int = // Return the number of partitions def getPartition(key Any): Int = // Return the partition for a given key } On Tue,

Re: Custom partitioner

2015-07-26 Thread Ted Yu
You can write subclass of Partitioner whose getPartition() returns partition number corresponding to the given key. Take a look at core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala for an example. Cheers On Sun, Jul 26, 2015 at 1:43 PM, Hafiz Mujadid wrote: > Hi > > I hav