[ https://issues.apache.org/jira/browse/FLINK-18629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161829#comment-17161829 ]
Dawid Wysakowicz edited comment on FLINK-18629 at 7/21/20, 7:48 AM: -------------------------------------------------------------------- The reason is extreme conservatism. The option with a common KEY type theoretically disallows potential use case where the keys are of a different type in the same hierarchy. E.g. {code} class A {}; class B extends A {}; ConnectedStreams<A, B> connectedStreams = stream1.connect(stream2) .keyBy(v -> v, v -> v); {code} Important fact here is that they would have to produce the same TypeInformation which is not easy to achieve (nevertheless possible through TypeInfoFactories). But I am more than happy to use the version with a single key type. was (Author: dawidwys): The reason is extreme conservatism. The option with a common KEY type theoretically disallows potential use case where the keys are of a different type in the same hierarchy. E.g. {code} class A {}; class B extends A {}; ConnectedStreams<A, B> connectedStreams = stream1.connect(stream2) .keyBy(v -> v, v -> v); {code} Important fact here is that they would have to produce the same TypeInformation which is not easy to achieve (nevertheless possible through TypeInfoFactories). But I am more than happy to use the version with a single key type. > ConnectedStreams#keyBy can not derive key TypeInformation for lambda > KeySelectors > --------------------------------------------------------------------------------- > > Key: FLINK-18629 > URL: https://issues.apache.org/jira/browse/FLINK-18629 > Project: Flink > Issue Type: Bug > Components: API / DataStream > Affects Versions: 1.10.0, 1.11.0, 1.12.0 > Reporter: Dawid Wysakowicz > Assignee: Dawid Wysakowicz > Priority: Critical > Fix For: 1.10.2, 1.12.0, 1.11.2 > > > Following test fails: > {code} > @Test > public void testKeyedConnectedStreamsType() { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > DataStreamSource<Integer> stream1 = env.fromElements(1, 2); > DataStreamSource<Integer> stream2 = env.fromElements(1, 2); > ConnectedStreams<Integer, Integer> connectedStreams = > stream1.connect(stream2) > .keyBy(v -> v, v -> v); > KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) > connectedStreams.getFirstInput(); > KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) > connectedStreams.getSecondInput(); > assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT)); > assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT)); > } > {code} > The problem is that the wildcard type is evaluated as {{Object}} for lambdas, > which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector > provided as lambda. > I suggest changing the method signature to: > {code} > public <K1, K2> ConnectedStreams<IN1, IN2> keyBy( > KeySelector<IN1, K1> keySelector1, > KeySelector<IN2, K2> keySelector2) > {code} > This would be a code compatible change. Might break the compatibility of > state backend (would change derived key type info). > Still there would be a workaround to use the second method for old programs: > {code} > public <KEY> ConnectedStreams<IN1, IN2> keyBy( > KeySelector<IN1, KEY> keySelector1, > KeySelector<IN2, KEY> keySelector2, > TypeInformation<KEY> keyType) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)