alter the range partitioner such that it skews the partitioning and assigns
more partitions to the heavier weighted keys? to do this you will have to
know the weighting before you start
On Wed, Sep 2, 2015 at 8:02 AM shahid ashraf wrote:
> yes i can take as an example , but my actual use case is
yes i can take as an example , but my actual use case is that in need to
resolve a data skew, when i do grouping based on key(A-Z) the resulting
partitions are skewed like
(partition no.,no_of_keys, total elements with given key)
<< partition: [(0, 0, 0), (1, 15, 17395), (2, 0, 0), (3, 0, 0), (4, 1
You can take the sortByKey as example:
https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L642
On Tue, Sep 1, 2015 at 3:48 AM, Jem Tucker wrote:
> something like...
>
> class RangePartitioner(Partitioner):
> def __init__(self, numParts):
> self.numPartitions = numParts
> self.parti
something like...
class RangePartitioner(Partitioner):
def __init__(self, numParts):
self.numPartitions = numParts
self.partitionFunction = rangePartition
def rangePartition(key):
# Logic to turn key into a partition id
return id
On Tue, Sep 1, 2015 at 11:38 AM shahid ashraf wrote:
> Hi
>
> I t
Hi
I think range partitioner is not available in pyspark, so if we want create
one. how should we create that. my question is that.
On Tue, Sep 1, 2015 at 3:57 PM, Jem Tucker wrote:
> Ah sorry I miss read your question. In pyspark it looks like you just need
> to instantiate the Partitioner cla
Ah sorry I miss read your question. In pyspark it looks like you just need
to instantiate the Partitioner class with numPartitions and partitionFunc.
On Tue, Sep 1, 2015 at 11:13 AM shahid ashraf wrote:
> Hi
>
> I did not get this, e.g if i need to create a custom partitioner like
> range partit
Hi
I did not get this, e.g if i need to create a custom partitioner like range
partitioner.
On Tue, Sep 1, 2015 at 3:22 PM, Jem Tucker wrote:
> Hi,
>
> You just need to extend Partitioner and override the numPartitions and
> getPartition methods, see below
>
> class MyPartitioner extends partit
Hi,
You just need to extend Partitioner and override the numPartitions and
getPartition methods, see below
class MyPartitioner extends partitioner {
def numPartitions: Int = // Return the number of partitions
def getPartition(key Any): Int = // Return the partition for a given key
}
On Tue,
You can write subclass of Partitioner whose getPartition() returns
partition number corresponding to the given key.
Take a look at
core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala for
an example.
Cheers
On Sun, Jul 26, 2015 at 1:43 PM, Hafiz Mujadid
wrote:
> Hi
>
> I hav