Meg,

What version are your clients, and what partitioner are you using for these
records?

If you're using the DefaultPartitioner from 2.4.0+, it has a known
imbalance flaw that is described and addressed by this KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
which was released in 3.3.0.
In order to make sure you're using the patched partitioner, the clients jar
should be on 3.3.0+ and your application should not set the
`partitioner.class` configuration, to let the producer choose the behavior.

In the short term, pausing, throttling, or restarting producers may help
resolve the imbalance, since the poor balance is caused by the state of the
producer buffers.
Adding nodes to the cluster and spreading partitions thinner may also help
increase the tolerance of each broker before it becomes unbalanced.
However, this will not solve the problem on its own, and may make it
temporarily worse while partitions are being replicated to the added nodes.
If you're already running the patched version of the partitioner, then a
more detailed investigation will be necessary.

I hope some of this helps!
Greg Harris

On Fri, Mar 24, 2023 at 11:57 AM Margaret Figura
<margaret.fig...@infovista.com.invalid> wrote:

> Hi,
>
> We have a 22-node Kafka 3.3.1 cluster on K8s. All data is sent with null
> partitionId and null key from 20 Java producers, so it should be
> distributed evenly across partitions. All was good for days, but a couple
> hours ago, broker 21 started receiving about 2x the data of the other
> brokers for a few topics (but not all). These topics are all 1x replicated
> and the 96 partitions are distributed evenly across brokers (each broker
> has 4 or 5 partitions). This was detected in Grafana, but I can also see
> the offsets increasing much faster for the partitions owned by broker 21 in
> KafkaOffsetsShell. What could cause this? I didn't see anything unusual in
> the broker 21 logs or the controller logs.
>
> Looking back, I noticed that broker 11 also becomes a bit unbalanced each
> day at the time when we are processing the most data, but it is only 10-15%
> higher than the others. All other brokers are quite even, including broker
> 21 until today.
>
> Any ideas on what I can check? Unfortunately we'll probably have to
> restart Kafka and/or the producers pretty soon.
>
> Thanks a lot!
> Meg
>

Reply via email to