[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066420#comment-15066420 ]
ASF GitHub Bot commented on FLINK-7: ------------------------------------ Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/1255#issuecomment-166297616 Sorry, @fhueske , i misunderstood your test data, the keys should be skewed on some value, while in my previous test, the keys are now skewed. it's complicate to calculate how many samples should be taken from a dataset to meet an a priori specified accuracy guarantee, one of the algorithm is described at http://research.microsoft.com/pubs/159275/MSR-TR-2012-18.pdf which i used before, but it should not totally fit into the case which keys are skewed. Would you continue to test how much it required to make partition roughly balanced? Raise the sample number should not add much overhead, i'm totally support of it. > [GitHub] Enable Range Partitioner > --------------------------------- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime > Reporter: GitHub Import > Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > ---------------- Imported from GitHub ---------------- > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)