Re: 答复: Limit Query Performance Suggestion

2017-01-18 Thread Liang-Chi Hsieh
Hi zhenhua, Thanks for the idea. Actually, I think we can completely avoid shuffling the data in a limit operation, no matter LocalLimit or GlobalLimit. wangzhenhua (G) wrote > How about this: > 1. we can make LocalLimit shuffle to mutiple partitions, i.e. create a new > partitioner to unifor

答复: Limit Query Performance Suggestion

2017-01-18 Thread wangzhenhua (G)
How about this: 1. we can make LocalLimit shuffle to mutiple partitions, i.e. create a new partitioner to uniformly dispatch the data class LimitUniformPartitioner(partitions: Int) extends Partitioner { def numPartitions: Int = partitions var num = 0 def getPartition(key: Any): Int = {