yes i can take as an example , but my actual use case is that in need to resolve a data skew, when i do grouping based on key(A-Z) the resulting partitions are skewed like (partition no.,no_of_keys, total elements with given key) << partition: [(0, 0, 0), (1, 15, 17395), (2, 0, 0), (3, 0, 0), (4, 13, 18196), (5, 0, 0), (6, 0, 0), (7, 0, 0), (8, 1, 1), (9, 0, 0)] and elements: >> the data has been skewed to partition 1 and 4, i need to split the partition. and do processing on split partitions and i should be able to combine splitted partition back also.
On Tue, Sep 1, 2015 at 10:42 PM, Davies Liu <dav...@databricks.com> wrote: > You can take the sortByKey as example: > https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L642 > > On Tue, Sep 1, 2015 at 3:48 AM, Jem Tucker <jem.tuc...@gmail.com> wrote: > > something like... > > > > class RangePartitioner(Partitioner): > > def __init__(self, numParts): > > self.numPartitions = numParts > > self.partitionFunction = rangePartition > > def rangePartition(key): > > # Logic to turn key into a partition id > > return id > > > > On Tue, Sep 1, 2015 at 11:38 AM shahid ashraf <sha...@trialx.com> wrote: > >> > >> Hi > >> > >> I think range partitioner is not available in pyspark, so if we want > >> create one. how should we create that. my question is that. > >> > >> On Tue, Sep 1, 2015 at 3:57 PM, Jem Tucker <jem.tuc...@gmail.com> > wrote: > >>> > >>> Ah sorry I miss read your question. In pyspark it looks like you just > >>> need to instantiate the Partitioner class with numPartitions and > >>> partitionFunc. > >>> > >>> On Tue, Sep 1, 2015 at 11:13 AM shahid ashraf <sha...@trialx.com> > wrote: > >>>> > >>>> Hi > >>>> > >>>> I did not get this, e.g if i need to create a custom partitioner like > >>>> range partitioner. > >>>> > >>>> On Tue, Sep 1, 2015 at 3:22 PM, Jem Tucker <jem.tuc...@gmail.com> > wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> You just need to extend Partitioner and override the numPartitions > and > >>>>> getPartition methods, see below > >>>>> > >>>>> class MyPartitioner extends partitioner { > >>>>> def numPartitions: Int = // Return the number of partitions > >>>>> def getPartition(key Any): Int = // Return the partition for a > given > >>>>> key > >>>>> } > >>>>> > >>>>> On Tue, Sep 1, 2015 at 10:15 AM shahid qadri < > shahidashr...@icloud.com> > >>>>> wrote: > >>>>>> > >>>>>> Hi Sparkians > >>>>>> > >>>>>> How can we create a customer partition in pyspark > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>>>>> For additional commands, e-mail: user-h...@spark.apache.org > >>>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> with Regards > >>>> Shahid Ashraf > >> > >> > >> > >> > >> -- > >> with Regards > >> Shahid Ashraf > -- with Regards Shahid Ashraf