Hi Sachin – yes, my estimate of 87 is only approx. and has a significant error 
margin – also for 100% CPU Utilisation which you don’t want – 50% is more 
normal for production.

35 partitions is likely low for the target throughput, but you can 
incrementally increase partitions and consumers, Paul

From: Sachin Mittal <sjmit...@gmail.com>
Date: Monday, 3 February 2025 at 3:41 pm
To: users@kafka.apache.org <users@kafka.apache.org>
Subject: Re: How to scale a Kafka Cluster, what all should we consider
[You don't often get email from sjmit...@gmail.com. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

EXTERNAL EMAIL - USE CAUTION when clicking links or attachments




Hi Paul,
For the current cluster I see CPU utilization of 10% and memory utilization
of 50%.
We were thinking of 7 nodes, each 16 vcpu cluster which would translate to
116 vCPU at cluster level, which would be significantly more than 87, you
have recommended.

wrt to the number of partitions 35 seems to be OK as our consumers should
rarely scale beyond 10 or something like that.

Please let me know if this sounds OK given our current utilization rates.

Thanks
Sachin


On Mon, Feb 3, 2025 at 7:51 AM Brebner, Paul
<paul.breb...@netapp.com.invalid> wrote:

> Hi Sachin,
>
> I’m not an “operational” Kafka person but do have some limited experience
> with Kafka benchmarking etc, so here are a few ideas.
>
> I’m playing around with a Kafka tiered storage sizing model at present,
> designed to predict min IO and/or network with local and tiered storage
> enabled. This may help you get a ball park figure for IO and network
> requirements.
>
> It’s available here (untested at present – just a best guess model):
> https://urldefense.com/v3/__https://github.com/instaclustr/code-samples/tree/main/Kafka/TieredStorage__;!!Nhn8V6BzJA!VeG--sCXcZ0fU2HsgKEVHaRMiyZ8eealcIMNisI17U4_gqPsy8_RPLhm04sfV0B0vbjNfii9417lhItZNxdY0g$<https://urldefense.com/v3/__https:/github.com/instaclustr/code-samples/tree/main/Kafka/TieredStorage__;!!Nhn8V6BzJA!VeG--sCXcZ0fU2HsgKEVHaRMiyZ8eealcIMNisI17U4_gqPsy8_RPLhm04sfV0B0vbjNfii9417lhItZNxdY0g$>
>
> For your scenario, assuming SSDs local storage and fan-out of 1 (which may
> not be your case as you said you have more producers than consumers?)  I
> came up with:
>
> Base load: 18 MB/s IO, 24 MB/s Network (these are all total cluster)
> Higher load: 1,800/2,400 MB/s IO/Network
>
> It’s likely you will need more CPU (= more brokers or more VCPUs per
> broker) as well.  What CPU utilisation does your cluster have with the
> current load?
>
> For consumer scaling, Little’s law is useful if you know the processing
> latency – then the number of consumers and partitions is just = target
> throughput x latency, again likely more than you currently have.
> If you only have 1 consumer currently you only need 1 partition – but with
> 35 partitions you can scale to 35 consumers – more if you increase
> partition count.
>
> I’m working on a open source Kafka performance model this year, which will
> include CPU hopefully as well as IO and network.
>
> However, a simple linear extrapolation model I’m working on at present
> suggests you may need a minimum of 87 VCPUs in your cluster for the higher
> load (this model tends to be a bit pessimistic in general, and doesn’t take
> into account other factors such as message rate, message size, fan-out,
> number of consumers and partitions etc, all of which potentially consume
> CPU – good luck!
>
> Regards, Paul Brebner
>
> From: Sachin Mittal <sjmit...@gmail.com>
> Date: Friday, 31 January 2025 at 7:41 pm
> To: users@kafka.apache.org <users@kafka.apache.org>
> Subject: How to scale a Kafka Cluster, what all should we consider
> [You don't often get email from sjmit...@gmail.com. Learn why this is
> important at 
> https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!Nhn8V6BzJA!VeG--sCXcZ0fU2HsgKEVHaRMiyZ8eealcIMNisI17U4_gqPsy8_RPLhm04sfV0B0vbjNfii9417lhIsOxb0_2g$<https://urldefense.com/v3/__https:/aka.ms/LearnAboutSenderIdentification__;!!Nhn8V6BzJA!VeG--sCXcZ0fU2HsgKEVHaRMiyZ8eealcIMNisI17U4_gqPsy8_RPLhm04sfV0B0vbjNfii9417lhIsOxb0_2g$>
>   ]
>
> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>
>
>
>
> Hi,
> I just wanted to have some general discussions around the topic of how to
> scale up a Kafka Cluster ?
>
> Currently we are running a 5 node Kafka Cluster.
>
> Each node has* 4 vcpu *and *8 GiB* memory.
> I have a topic which is partitioned *35* ways.
> I have *5* producers publishing messages to that topic.
> I have *1* consumer consuming messages from that topic.
> Each message is a JSON string say *2 - 4KB* uncompressed and *600Bytes*
> compressed.
>
> Right now this cluster can handle *10,000 *messages per second.
> I see no lag in the producer or consumer side.
>
> I would like to scale the cluster to handle *1 million* messages per
> second.
> What are the areas I should look into and in what order ?
>
> My producers and consumers can scale independently.
> I can run 35 producers and consumers if need be.
>
> Questions I have are
> 1. Should I increase the number of partitions for that topic from 35 ?
> 2. Should I increase the number of brokers from 5 ?
> 3. Should I increase the instance size in terms of memory or CPU per node ?
> 4. Or should it be combinations of any of the above options ?
> 5. Are there any other settings in the producer or broker side or topic
> side I should consider ?
>
> Thanks
> Sachin
>

Reply via email to