Hi,

We had a small cluster (4 brokers) dealing with very low throughput - a couple 
hundred messages per minute at the very most. In that cluster we had a little 
under 3300 total consumers (all were kafka streams instances). All broker CPUs 
were maxed out almost consistently for a few weeks.

We switched traffic to a new cluster eventually. The old cluster sitting idle 
for a few days was at ~40% CPU, with consumers still running. When I took down 
all the consumers, the idle CPU on the brokers went to about 4%.

To test, we decided to mirror active traffic in our new cluster to the old 
cluster (which now has no running consumers). The CPU didn't budge; it's still 
at ~4% as expected with the low throughput.

One more thing to add: I ran a thread profiler on a couple brokers when the old 
cluster was taking active traffic with running consumers and the CPU was maxed 
out. Each time, I saw the ReplicaFetcherThread eating up around 40% of CPU time.

Can you give any advice on what might be the root cause of this?

Thanks,
Brandon

Reply via email to