[ 
https://issues.apache.org/jira/browse/FLINK-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074847#comment-16074847
 ] 

Xingcan Cui commented on FLINK-6936:
------------------------------------

Hi [~aljoscha], sorry for the late reply. It takes me a little time to get 
familiar with the state/checkpoint mechanism of Flink. Correct me if I miss or 
misunderstand something.

In summary, I partially agree with you that more should be considered in 
addition to the new interface itself and the state management could be the most 
relevant part. However, maybe this should have been considered before adding 
the existing {{Partitioner}} since there is no difference between single target 
or multiple targets.

In my own view, for each partitioning method, there should be at least one 
corresponding state backend, e.g., keyed partitioner<=>(List/Map) keyed state 
and broadcasting<=>union state. However, for custom partitioning, there is no 
corresponding state support, i.e., when the partitioning scheme changes at 
runtime (that could be possible with the existing {{Partitioner}} interface), 
there is no way to inform the "state manager" and perform state migration with 
a user-defined migration handler (like the custom {{Partitioner}}).

Coincidently, [~tzulitai] has just posted a FLIP about "Eager State". Maybe 
it's a chance to refactor the hierarchy of state related code (like the 
proposal in [FLINK-6849|https://issues.apache.org/jira/browse/FLINK-6849]) by 
reconsidering the inherent relationships between the partitioning methods and 
the state supports, e.g., adding the mechanism to trigger an active state 
migration by user, rather than the current passive way.

As a beginner, I am not sure if the above have already been discussed or 
considered. But as a user, I think they are essential to make Flink a general 
and perfect processing platform. What do you think? [~aljoscha] [~tzulitai]

> Add multiple targets support for custom partitioner
> ---------------------------------------------------
>
>                 Key: FLINK-6936
>                 URL: https://issues.apache.org/jira/browse/FLINK-6936
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Xingcan Cui
>            Assignee: Xingcan Cui
>            Priority: Minor
>
> The current user-facing Partitioner only allows returning one target.
> {code:java}
> @Public
> public interface Partitioner<K> extends java.io.Serializable, Function {
>       /**
>        * Computes the partition for the given key.
>        *
>        * @param key The key.
>        * @param numPartitions The number of partitions to partition into.
>        * @return The partition index.
>        */
>       int partition(K key, int numPartitions);
> }
> {code}
> Actually, this function should return multiple partitions and this may be a 
> historical legacy.
> There could be at least three approaches to solve this.
> # Make the `protected DataStream<T> setConnectionType(StreamPartitioner<T> 
> partitioner)` method in DataStream public and that allows users to directly 
> define StreamPartitioner.
> # Change the `partition` method in the Partitioner interface to return an int 
> array instead of a single int value.
> # Add a new `multicast` method to DataStream and provide a MultiPartitioner 
> interface which returns an int array.
> Considering the consistency of API, the 3rd approach seems to be an 
> acceptable choice. [~aljoscha], what do you think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to