[ https://issues.apache.org/jira/browse/FLINK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064998#comment-16064998 ]
Aljoscha Krettek commented on FLINK-6936: ----------------------------------------- [~xccui] I agree with you that having an extra {{KeySelector}} when using a custom {{Partitioner}} seems unnecessary. The only purpose for having a {{KeySelector}} is when you want to use keyed state, which you cannot use if you have a custom key selector. Do you have a design document about how you want to approach the stream-stream theta join? I would be especially interested in how you want to deal with fault-tolerance, i.e. how the in-flight data is being buffered while waiting for events to join with. > Add multiple targets support for custom partitioner > --------------------------------------------------- > > Key: FLINK-6936 > URL: https://issues.apache.org/jira/browse/FLINK-6936 > Project: Flink > Issue Type: Improvement > Components: DataStream API > Reporter: Xingcan Cui > Assignee: Xingcan Cui > Priority: Minor > > The current user-facing Partitioner only allows returning one target. > {code:java} > @Public > public interface Partitioner<K> extends java.io.Serializable, Function { > /** > * Computes the partition for the given key. > * > * @param key The key. > * @param numPartitions The number of partitions to partition into. > * @return The partition index. > */ > int partition(K key, int numPartitions); > } > {code} > Actually, this function should return multiple partitions and this may be a > historical legacy. > There could be at least three approaches to solve this. > # Make the `protected DataStream<T> setConnectionType(StreamPartitioner<T> > partitioner)` method in DataStream public and that allows users to directly > define StreamPartitioner. > # Change the `partition` method in the Partitioner interface to return an int > array instead of a single int value. > # Add a new `multicast` method to DataStream and provide a MultiPartitioner > interface which returns an int array. > Considering the consistency of API, the 3rd approach seems to be an > acceptable choice. [~aljoscha], what do you think? -- This message was sent by Atlassian JIRA (v6.4.14#64029)