Hi Greg, IMO you are right. We should remove duplicate sort keys.
Best, Fabian 2016-10-03 16:04 GMT+02:00 Greg Hogan <c...@greghogan.com>: > Is it correct to expect that Flink should remove duplicate sort keys? I'm > working on instrumenting the FixedLengthRecordSorter (FLINK-4705) and the > following test case from TypeHintITCase:200 is having an unexpected effect > due to the keyPositions = {0, 0} being passed to TupleComparator. > > DataSet<Integer> resultDs = ds > .groupBy(0) > .sortGroup(0, Order.ASCENDING) > .reduceGroup(new GroupReducer<Tuple3<Integer, Long, String>, > Integer>()) > .returns(BasicTypeInfo.INT_TYPE_INFO); > > The sortGroup will have no affect since only one key is presented to the > UDF at a time. Flink also makes no guarantees as to the order in which keys > are presented to the UDF, which are sorted per partition. I would also > expect repeat keys in groupBy to be ignored. > > Greg >