[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063322#comment-15063322 ]
ASF GitHub Bot commented on FLINK-7: ------------------------------------ Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/1255#issuecomment-165652344 Hi, @fhueske , For the partition part, i think it's normal that `RangePartition` is slower than `HashParition`, as you've mentioned, `RangePartition` introduce more overhead. The most difference between `HashParition` and `RangePartition` is that, `HashParition` is key-wise partition(elements with same key would shuffled to same target), and `RangePartition` is key-wise and partition-wise partition(the partition is in order as well), so for global order, we can sort in parallel after `RangePartition`, that's what we can benefit from `RangePartition`. On the other side, it's still make sense to improve `RangePartition` performance, although i don't think increasing the sample size would help here. Based on my previous calculation and test, `parallelism * 20` is enough to generate well-proportioned partitions. Do you find there is data skew in any partition after `RangePartition`? > [GitHub] Enable Range Partitioner > --------------------------------- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime > Reporter: GitHub Import > Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > ---------------- Imported from GitHub ---------------- > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)