Artem Livshits created KAFKA-14156:
--------------------------------------

             Summary: Built-in partitioner may create suboptimal batches with 
large linger.ms
                 Key: KAFKA-14156
                 URL: https://issues.apache.org/jira/browse/KAFKA-14156
             Project: Kafka
          Issue Type: Improvement
          Components: producer 
    Affects Versions: 3.3.0
            Reporter: Artem Livshits


The new built-in "sticky" partitioner switches partitions based on the amount 
of bytes produced to a partition.  It doesn't use batch creation as a switch 
trigger.  The previous "sticky" DefaultPartitioner switched partition when a 
new batch was created and with small linger.ms (default is 0) could result in 
sending larger batches to slower brokers potentially overloading them.  See 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
 for more detail.

However, the with large linger.ms, the new built-in partitioner may create 
suboptimal batches.  Let's consider an example, suppose linger.ms=500, 
batch.size=16KB (default) and we produce 24KB / sec, i.e. every 500ms we 
produce 12KB worth of data.  The new built-in partitioner would switch 
partition on every 16KB, so we could get into the following batching pattern:
 * produce 12KB to one partition in 500ms, hit linger, send 12KB batch
 * produce 4KB more to the same partition, now we've produced 16KB of data, 
switch partition
 * produce 12KB to the second partition in 500ms, hit linger, send 12KB batch
 * in the mean time the 4KB produced to the first partition would hit linger as 
well, sending 4KB batch
 * produce 4KB more to the second partition, now we've produced 16KB of data to 
the second partition, switch to 3rd partition

so in this scenario the new built-in partitioner produces a mix of 12KB and 4KB 
batches, while the previous DefaultPartitioner would produce only 12KB batches 
-- it switches on new batch creation, so there is no "mid-linger" leftover 
batches.

To avoid creation of batch fragmentation on partition switch, we can wait until 
the batch is ready before switching the partition, i.e. the condition to switch 
to a new partition would be "produced batch.size bytes" AND "batch is not 
lingering".  This may potentially introduce some non-uniformity into data 
distribution, but unlike the previous DefaultPartitioner, the non-uniformity 
would not be based on broker performance and won't re-introduce the bad pattern 
of sending more data to slower brokers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to