I have equal sized partitions now, but I want the RDD to be partitioned such that the partitions are equally weighted by some attribute of each RDD element (e.g. size or complexity).
I have been looking at the RangePartitioner code and I have come up with something like EquallyWeightedPartitioner(noOfPartitions, weightFunction) 1) take a sum or (sample) of complexities of all elements and calculate average weight per partition 2) take a histogram of weights 3) assign a list of partitions to each bucket 4) getPartition(key: Any): Int would a) get the weight and then find the bucket b) assign a random partition from the list of partitions associated with each bucket Just wanted to know if someone else had come across this issue before and there was a better way of doing this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Equally-weighted-partitions-in-Spark-tp5171p5212.html Sent from the Apache Spark User List mailing list archive at Nabble.com.