I have a large Kafka deployment on virtual hardware: 120 brokers on 32gb
memory 8 core virtual machines. Gigabit network, RHEL 6.7. 4 Topics, 1200
partitions each, replication factor of 2 and running Kafka 0.8.1.2

 

We are running into issues where our cluster is not keeping up. We have 4
sets of producers (30 producers per set) set to produce to the 4 topics
(producers produce to multiple topics). The messages are about 150 byte on
average and we are attempting to produce between 1 million and 2 million
messages a second per producer set.

 

We run into issues after about 1 million messages a second - just for that
producer set, the producer buffers fill up and we are blocked from producing
messages. This does not seem to impact the other producer sets - they run
without issues until they too reach about 1m messages a second.

 

Looking at the metrics available to us we do not see a bottleneck, we don't
see disk I/O maxing out, CPU and network are nominal. We have tried
increasing and decreasing the Kafka cluster size to no avail, we have gone
from 100 partitions to 1200 partitions per topic. We have increased and
decreased the number of producers and yet we run into the same issues. Our
Kafka config is mostly out the box - 1 hour log roll/retention, increased
the buffer sizes a bit but other than that it's out the box. 

 

I was wondering if someone has some recommendations for identifying the
bottleneck and/or what configuration values we should be taking a look at?
Is there known issues with Kafka on virtualized hardware or things to watch
out for when deploying to VMs? Are there use cases where Kafka is being used
in a similar way - +4 million messages a second of discrete 150 byte
messages?

 

Kind regards,

 

Jahn Roux 

 



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Reply via email to