
Dawid Wysakowicz edited comment on FLINK-18629 at 7/21/20, 7:48 AM:

The reason is extreme conservatism. The option with a common KEY type 
theoretically disallows potential use case where the keys are of a different 
type in the same hierarchy.

class A {};
class B extends A {};

ConnectedStreams<A, B> connectedStreams = stream1.connect(stream2)
        .keyBy(v -> v, v -> v);

Important fact here is that they would have to produce the same TypeInformation 
which is not easy to achieve (nevertheless possible through TypeInfoFactories).
But I am more than happy to use the version with a single key type.

was (Author: dawidwys):
The reason is extreme conservatism. The option with a common KEY type 
theoretically disallows potential use case where the keys are of a different 
type in the same hierarchy.

        class A {};
        class B extends A {};

       ConnectedStreams<A, B> connectedStreams = stream1.connect(stream2)
                .keyBy(v -> v, v -> v);

Important fact here is that they would have to produce the same TypeInformation 
which is not easy to achieve (nevertheless possible through TypeInfoFactories).
But I am more than happy to use the version with a single key type.

> ConnectedStreams#keyBy can not derive key TypeInformation for lambda 
> KeySelectors
> ---------------------------------------------------------------------------------
>                 Key: FLINK-18629
>                 URL: https://issues.apache.org/jira/browse/FLINK-18629
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream
>    Affects Versions: 1.10.0, 1.11.0, 1.12.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Critical
>             Fix For: 1.10.2, 1.12.0, 1.11.2
> Following test fails:
> {code}
>       @Test
>       public void testKeyedConnectedStreamsType() {
>               StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>               DataStreamSource<Integer> stream1 = env.fromElements(1, 2);
>               DataStreamSource<Integer> stream2 = env.fromElements(1, 2);
>               ConnectedStreams<Integer, Integer> connectedStreams = 
> stream1.connect(stream2)
>                       .keyBy(v -> v, v -> v);
>               KeyedStream<?, ?> firstKeyedInput = (KeyedStream<?, ?>) 
> connectedStreams.getFirstInput();
>               KeyedStream<?, ?> secondKeyedInput = (KeyedStream<?, ?>) 
> connectedStreams.getSecondInput();
>               assertThat(firstKeyedInput.getKeyType(), equalTo(Types.INT));
>               assertThat(secondKeyedInput.getKeyType(), equalTo(Types.INT));
>       }
> {code}
> The problem is that the wildcard type is evaluated as {{Object}} for lambdas, 
> which in turn produces {{GenericTypeInfo<Object>}} for any KeySelector 
> provided as lambda.
> I suggest changing the method signature to:
> {code}
>       public <K1, K2> ConnectedStreams<IN1, IN2> keyBy(
>                       KeySelector<IN1, K1> keySelector1,
>                       KeySelector<IN2, K2> keySelector2)
> {code}
> This would be a code compatible change. Might break the compatibility of 
> state backend (would change derived key type info). 
> Still there would be a workaround to use the second method for old programs:
> {code}
>       public <KEY> ConnectedStreams<IN1, IN2> keyBy(
>                       KeySelector<IN1, KEY> keySelector1,
>                       KeySelector<IN2, KEY> keySelector2,
>                       TypeInformation<KEY> keyType)
> {code}

This message was sent by Atlassian Jira

Reply via email to