Hi zhenhua,
Thanks for the idea.
Actually, I think we can completely avoid shuffling the data in a limit
operation, no matter LocalLimit or GlobalLimit.
wangzhenhua (G) wrote
> How about this:
> 1. we can make LocalLimit shuffle to mutiple partitions, i.e. create a new
> partitioner to unifor
How about this:
1. we can make LocalLimit shuffle to mutiple partitions, i.e. create a new
partitioner to uniformly dispatch the data
class LimitUniformPartitioner(partitions: Int) extends Partitioner {
def numPartitions: Int = partitions
var num = 0
def getPartition(key: Any): Int = {