[ https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234216#comment-15234216 ]
ASF GitHub Bot commented on FLINK-3665: --------------------------------------- Github user dawidwys commented on a diff in the pull request: https://github.com/apache/flink/pull/1848#discussion_r59137767 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/operators/PartitionOperator.java --- @@ -98,6 +101,14 @@ public PartitionOperator(DataSet<T> input, Keys<T> pKeys, Partitioner<?> customP this.customPartitioner = customPartitioner; this.distribution = distribution; } + + public PartitionOperator<T> withOrders(Order... orders) { --- End diff -- Hi. I started working on this change, but I don't quite know how should I treat keyExpression (with wildcards especially). Lets take some complex example: ``` TypeInformation<Tuple3<Integer, Pojo1, PojoWithMultiplePojos>> ti = new TupleTypeInfo<>( BasicTypeInfo.INT_TYPE_INFO, TypeExtractor.getForClass(Pojo1.class), TypeExtractor.getForClass(PojoWithMultiplePojos.class) ); ek = new ExpressionKeys<>(new String[] {"f2.p1.*", "f0"}, ti); public static class Pojo1 { public String a; public String b; } public static class Pojo2 { public String a2; public String b2; } public static class PojoWithMultiplePojos { public Pojo1 p1; public Pojo2 p2; public Integer i0; } ``` What should be the output of `ek.getOriginalKeyFieldTypes`? > Range partitioning lacks support to define sort orders > ------------------------------------------------------ > > Key: FLINK-3665 > URL: https://issues.apache.org/jira/browse/FLINK-3665 > Project: Flink > Issue Type: Improvement > Components: DataSet API > Affects Versions: 1.0.0 > Reporter: Fabian Hueske > Fix For: 1.1.0 > > > {{DataSet.partitionByRange()}} does not allow to specify the sort order of > fields. This is fine if range partitioning is used to reduce skewed > partitioning. > However, it is not sufficient if range partitioning is used to sort a data > set in parallel. > Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily > changed, I propose to add a method {{withOrders(Order... orders)}} to > {{PartitionOperator}}. The method should throw an exception if the > partitioning method of {{PartitionOperator}} is not range partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)