Sounds like adding a round robin partitioner to the set of readily available partitioners would make sense.
On Fri, Dec 1, 2017 at 5:16 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > Hi Mike, > > The rationale behind implementing the FlinkFixedPartitioner as the default > is so that each Flink sink partition (i.e. one sink parallel subtask) maps > to a single Kafka partition. > > One other thing to clarify: > By setting the partitioner to null, the partitioning is based on a hash of > the record's attached key (the key retrieved from the > `SerializationSchema`), not round-robin. > To use round-robin partitioning, a custom partitioner should be provided. > Note however, a round-robin partitioner will create a lot of network > connections to all Kafka brokers on all Flink sink parallel subtasks, which > can be quite a lot. > > To conclude this, I think the appropriate partitioning scheme depends on > the > actual case. > For example, for some simple Flink job that only does some filtering of > data > and has no aggregation within the pipeline, the key hash based partitioning > would probably be more ideal. > For more complex pipelines that partitions the computation by key already, > it could make sense that a direct mapping of a Flink sink partition to > Kafka > partition would do. > > On the other hand, considering that the key for each record is always > "re-calculated" by the `SerializationSchema` in each Flink Kafka Producer > sink partition, it might make sense to actually make the key hash > partitioner as the default instead. > > Cheers, > Gordon > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ >