[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952773#comment-14952773 ]
ASF GitHub Bot commented on FLINK-7: ------------------------------------ GitHub user ChengXiangLi opened a pull request: https://github.com/apache/flink/pull/1255 [FLINK-7] [Runtime] Enable Range Partitioner. This PR enable range partitioner for Flink follow the path of existing other partitioners. It depends on the sample operator to random sample data from `DataSet` and build range boundaries based on sampled data. 2 other hints about PR: 1. Why execute the sample data job in `JobGraphGenerator` instead of `PartitionOperator`? i. launch another job in compile time would lead to infinite job submission, because the `DataSink`s has not been cleared during compile time. ii. we need the target stage parallelism to decide sample data size, and `TypeSerializer`/`TypeComparator` to serialize/sort sampled data. 2. Expand the `DataDistribution` API, previous `DataDistribution` take `Key[]` as range boundaries, there is not simple generic way to extract Key from nested object, and `TypeComparator::compareAgainstReference()` is not supported by current comparators. Use `DataSet` elements as the range boundaries make everything much easier, we could use 'TypeComparator::compare()' directly for sort during build `DataDistribution` and selecting channel. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ChengXiangLi/flink rangepartitioner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1255 ---- commit 8a41b18c6c40115d545271039e51ebad44300191 Author: chengxiang li <chengxiang...@intel.com> Date: 2015-10-12T07:13:38Z [FLINK-7] [Runtime] Enable Range Partitioner. ---- > [GitHub] Enable Range Partitioner > --------------------------------- > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime > Reporter: GitHub Import > Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > ---------------- Imported from GitHub ---------------- > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)