Cheers from NYC!

I'm trying to give a performance number to a potential client (from the
financial market) who asked me the following question:

*"If I have a Kafka system setup in the best way possible for performance,
what is an approximate number that I can have in mind for the throughput of
this system?"*

The client proceeded to say:

*"What I want to know specifically, is how many messages per second can I
send from one side of my distributed system to the other side with Apache
Kafka."*

And he concluded with:

*"To give you an example, let's say I have 10 million messages that I need
to send from producers to consumers. Let's assume I have 1 topic, 1
producer for this topic, 4 partitions for this topic and 4 consumers, one
for each partition. What I would like to know is: How long is it going to
take for these 10 million messages to travel all the way from the producer
to the consumers? That's the throughput performance number I'm interested
in."*

I read in a reddit post yesterday (for some reason I can't find the post
anymore) that Kafka is able to handle 7 trillion messages per day. The
LinkedIn article about it, says:


*"We maintain over 100 Kafka clusters with more than 4,000 brokers, which
serve more than 100,000 topics and 7 million partitions. The total number
of messages handled by LinkedIn’s Kafka deployments recently surpassed 7
trillion per day."*

The OP of the reddit post went on to say that WhatsApp is handling around
64 billion messages per day (740,000 msgs per sec x 24 x 60 x 60) and that 7
trillion for LinkedIn is a huge number, giving a whopping 81 million
messages per second for LinkedIn. But that doesn't matter for my question.

7 Trillion messages divided by 7 million partitions gives us 1 million
messages per day per partition. So to calculate the throughput we do:

    1 million divided by 60 divided by 60 divided by 24 => *23 messages per
second per partition*

We'll all agree that 23 messages per second per partition for throughput
performance is very low, so I can't give this number to my potential client.

So my question is: *What number should I give to my potential client?* Note
that he is a stubborn and strict bank CTO, so he won't take any talk from
me. He wants a mathematical answer using the scientific method.

Has anyone been in my shoes and can shed some light on this kafka
throughput performance topic?

Cheers,

M. Queen

Reply via email to