Hi! I use Spark heavily for various workloads and always fall in the situation when there is some skewed dataset (without any partitioner assigned) and I just want to "redistribute" its data more evenly.
For example, say there is RDD of X partitions with Y rows on each except one large partition with Y * 10 rows. I don't want to change number of partitions, only redistribute it. Obviously, such operation should not send more than ~Y * 9 rows across the network. But the only option available is repartition that requires full shuffle that takes ALL (X * Y) rows. The question: why there is no such operation like "redistribute"?
