Hi Sachin,

I’m not an “operational” Kafka person but do have some limited experience with 
Kafka benchmarking etc, so here are a few ideas.

I’m playing around with a Kafka tiered storage sizing model at present, 
designed to predict min IO and/or network with local and tiered storage 
enabled. This may help you get a ball park figure for IO and network 
requirements.

It’s available here (untested at present – just a best guess model): 
https://github.com/instaclustr/code-samples/tree/main/Kafka/TieredStorage

For your scenario, assuming SSDs local storage and fan-out of 1 (which may not 
be your case as you said you have more producers than consumers?)  I came up 
with:

Base load: 18 MB/s IO, 24 MB/s Network (these are all total cluster)
Higher load: 1,800/2,400 MB/s IO/Network

It’s likely you will need more CPU (= more brokers or more VCPUs per broker) as 
well.  What CPU utilisation does your cluster have with the current load?

For consumer scaling, Little’s law is useful if you know the processing latency 
– then the number of consumers and partitions is just = target throughput x 
latency, again likely more than you currently have.
If you only have 1 consumer currently you only need 1 partition – but with 35 
partitions you can scale to 35 consumers – more if you increase partition count.

I’m working on a open source Kafka performance model this year, which will 
include CPU hopefully as well as IO and network.

However, a simple linear extrapolation model I’m working on at present suggests 
you may need a minimum of 87 VCPUs in your cluster for the higher load (this 
model tends to be a bit pessimistic in general, and doesn’t take into account 
other factors such as message rate, message size, fan-out, number of consumers 
and partitions etc, all of which potentially consume CPU – good luck!

Regards, Paul Brebner

From: Sachin Mittal <sjmit...@gmail.com>
Date: Friday, 31 January 2025 at 7:41 pm
To: users@kafka.apache.org <users@kafka.apache.org>
Subject: How to scale a Kafka Cluster, what all should we consider
[You don't often get email from sjmit...@gmail.com. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments




Hi,
I just wanted to have some general discussions around the topic of how to
scale up a Kafka Cluster ?

Currently we are running a 5 node Kafka Cluster.

Each node has* 4 vcpu *and *8 GiB* memory.
I have a topic which is partitioned *35* ways.
I have *5* producers publishing messages to that topic.
I have *1* consumer consuming messages from that topic.
Each message is a JSON string say *2 - 4KB* uncompressed and *600Bytes*
compressed.

Right now this cluster can handle *10,000 *messages per second.
I see no lag in the producer or consumer side.

I would like to scale the cluster to handle *1 million* messages per second.
What are the areas I should look into and in what order ?

My producers and consumers can scale independently.
I can run 35 producers and consumers if need be.

Questions I have are
1. Should I increase the number of partitions for that topic from 35 ?
2. Should I increase the number of brokers from 5 ?
3. Should I increase the instance size in terms of memory or CPU per node ?
4. Or should it be combinations of any of the above options ?
5. Are there any other settings in the producer or broker side or topic
side I should consider ?

Thanks
Sachin

Reply via email to