[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

ASF GitHub Bot (JIRA) Mon, 12 Oct 2015 00:58:28 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952773#comment-14952773
 ]


ASF GitHub Bot commented on FLINK-7:
------------------------------------

GitHub user ChengXiangLi opened a pull request:

    https://github.com/apache/flink/pull/1255

    [FLINK-7] [Runtime] Enable Range Partitioner.

    This PR enable range partitioner for Flink follow the path of existing 
other partitioners. It depends on the sample operator to random sample data 
from `DataSet` and build range boundaries based on sampled data. 2 other hints 
about PR:
    1. Why execute the sample data job in `JobGraphGenerator` instead of 
`PartitionOperator`?
         i. launch another job in compile time would lead to infinite job 
submission, because the `DataSink`s has not been cleared during compile time.
         ii. we need the target stage parallelism to decide sample data size, 
and `TypeSerializer`/`TypeComparator` to serialize/sort sampled data.
    2. Expand the `DataDistribution` API, previous `DataDistribution` take 
`Key[]` as range boundaries, there is not simple generic way to extract Key 
from nested object, and `TypeComparator::compareAgainstReference()` is not 
supported by current comparators. Use `DataSet` elements as the range 
boundaries make everything much easier, we could use 
'TypeComparator::compare()' directly for sort during build `DataDistribution` 
and selecting channel.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ChengXiangLi/flink rangepartitioner

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1255
    
----
commit 8a41b18c6c40115d545271039e51ebad44300191
Author: chengxiang li <chengxiang...@intel.com>
Date:   2015-10-12T07:13:38Z

    [FLINK-7] [Runtime] Enable Range Partitioner.

----


> [GitHub] Enable Range Partitioner
> ---------------------------------
>
>                 Key: FLINK-7
>                 URL: https://issues.apache.org/jira/browse/FLINK-7
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Distributed Runtime
>            Reporter: GitHub Import
>            Assignee: Chengxiang Li
>             Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

Reply via email to