[ 
https://issues.apache.org/jira/browse/FLINK-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249428#comment-15249428
 ] 

ASF GitHub Bot commented on FLINK-2998:
---------------------------------------

Github user gallenvara commented on the pull request:

    https://github.com/apache/flink/pull/1838#issuecomment-212300584
  
    @fhueske PR updated.
    I am a little confused when i wrote the tests. The original dataset handled 
by a `map` operator to ensure that the type of partition key is same with the 
boundary in the supplied distribution. In the `DataDistribution` interface, the 
type of `getBucketBoundary` method returned is `Object[]`. My doubt is whether 
this can be changed to type of `Tuple`. I mean that when range partition by one 
field, it return `Tuple1` and two fields return `Tuple2`. Also in the 
`OutputEmmiter`, change the type of keys from `Object[]` to `Tuple` and 
comparing the key with boundary using `Tuple` comparator. If this is possible, 
the boundaries in the distribution for rangePartition test will be:
    `Tuple2<Integer, Long>[] boundaries = new Tuple2[]{
    new Tuple2(1, 1L),
    new Tuple2(3, 2L),
    ....
    }`
    This can make the test more succinct and direct.
    Another confusing is that why partitionByHash and partitionByRange do not 
support some KeySelectors returned Tuple type such as:
    ```
    public static class KeySelector3 implements 
KeySelector<Tuple3<Integer,Long,String>, Tuple2<Integer,Long>> {
                private static final long serialVersionUID = 1L;
                @Override
                public Tuple2<Integer,Long> getKey(Tuple3<Integer,Long,String> 
in) {
                        return new Tuple2<>(in.f0,in.f1);
                }
        }
    ```
    and can not run the following codes:
    ```
    DataSet<Tuple3<Integer,Long,String>> dataSet = ...;
    dataSet.partitionByRange(new KeySelector3());
    ```
    Can you explain it to me?Thanks!


> Support range partition comparison for multi input nodes.
> ---------------------------------------------------------
>
>                 Key: FLINK-2998
>                 URL: https://issues.apache.org/jira/browse/FLINK-2998
>             Project: Flink
>          Issue Type: New Feature
>          Components: Optimizer
>            Reporter: Chengxiang Li
>            Priority: Minor
>
> The optimizer may have potential opportunity to optimize the DAG while it 
> found two input range partition are equivalent, we does not support the 
> comparison yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to