Sounds like adding a round robin partitioner to the set of readily
available partitioners would make sense.

On Fri, Dec 1, 2017 at 5:16 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Hi Mike,
>
> The rationale behind implementing the FlinkFixedPartitioner as the default
> is so that each Flink sink partition (i.e. one sink parallel subtask) maps
> to a single Kafka partition.
>
> One other thing to clarify:
> By setting the partitioner to null, the partitioning is based on a hash of
> the record's attached key (the key retrieved from the
> `SerializationSchema`), not round-robin.
> To use round-robin partitioning, a custom partitioner should be provided.
> Note however, a round-robin partitioner will create a lot of network
> connections to all Kafka brokers on all Flink sink parallel subtasks, which
> can be quite a lot.
>
> To conclude this, I think the appropriate partitioning scheme depends on
> the
> actual case.
> For example, for some simple Flink job that only does some filtering of
> data
> and has no aggregation within the pipeline, the key hash based partitioning
> would probably be more ideal.
> For more complex pipelines that partitions the computation by key already,
> it could make sense that a direct mapping of a Flink sink partition to
> Kafka
> partition would do.
>
> On the other hand, considering that the key for each record is always
> "re-calculated" by the `SerializationSchema` in each Flink Kafka Producer
> sink partition, it might make sense to actually make the key hash
> partitioner as the default instead.
>
> Cheers,
> Gordon
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Reply via email to