Hi Tai, OK, thanks for confirming. I understand that streaming shuffle
is cheaper than batch-spill shuffle, but nevertheless may be unacceptable in
large volume applications - it is still a network shuffle and that's
the biggest part of the cost.
Now on the point of the trade off between load-bala
Hi Michal,
I see, thanks for the description. I think you’ve definitely raised an
interesting point.
Yes, there may be unnecessary shuffle in this case (the partitionCustom on
consumed DataStream doesn’t override the assignPartitions in the Kafka
connector; custom partitioners are applied after th
Hi Tai, I was referring to co-partitioning, not co-location of leaders,
i.e. multiple topics that share the same partitioning scheme. By example,
say I have 2 topics which share the same keyspace and which are produced by
something other than Flink using identical partitioner. The data in these 2
t
Hi Michal,
Whether or not the external system's partitioning scheme is referenced when
assigning
partitions to the consumer parallel subtasks depends on the implementation
of each connector / source.
First, clarification on “co-partitioning": from your context I’m assuming
you’re referring to co-
Hi, I was recently looking into a kafka connector issue (FLINK-4023 /
FLINK-4069), when it was pointed out that partition assignment will not be
deterministic if the partition discovery is imply moved to the open()
method. In the assignPartitions of FlinkKafkaConsumerBase a modulo on the
__index__