Hi Joris,

I've spoken to him. His answers are below:


On Thu, Jan 6, 2022 at 1:37 PM Joris Peeters <joris.mg.peet...@gmail.com>
wrote:

> There's a few unknown parameters here that might influence the answer,
> though. From the top of my head, at least
> - How much replication of the data is needed (for high availability), and
> how many acks for the producer? (If fire-and-forget it can be faster, if
> need to replicate and ack from 3 brokers in different DC's then will be
> slower)
>

Let's assume no high-availability for now, for simplicity's sake.
Fire-and-forget like he said. We don't want to overcomplicate this simple
benchmark and we want the highest possible throughput number.


> - Transactions? (If end-to-end exactly-once then it's a lot slower)
>

Again no transactions. Let's keep it simple.


> - Size of the messages? (If each message is a GB it will obviously be
> slower)
>

Let's assume 512 bytes. Powers of two are fun!


> - Distance and bandwidth between the producers, Kafka & the consumers? (If
> the network links get saturated that would limit the performance. Latency
> is likely less important than throughput, but if your consumers are in
> Tokyo and the producer in London then it will likely also be slower)
>


Loopback, same machine, for the love of God. Let's not even go there. We
want the highest possible throughput. I accept the limit of the speed of
light. If network particularities, and distances, are to be included in
this measurement then it is basically worth nothing. Loopback eliminates
all those network variables that we surely don't want to include in the
benchmark.


>
> FWIW, I find that the producer side is generally the limiting factor,
> especially if there is only one.
> I'd take a look at e.g. the Appendix test details on
> https://docs.confluent.io/2.0.0/clients/librdkafka/INTRODUCTION_8md.html.
> I
> haven't yet seen a faster Kafka impl than rdkafka, so those would be
> reasonable upper bounds.
>


Thanks for your reply, Joris. Can you point me to a Hello World Kafka
example, so I can set up this very basic and BARE BONES Kafka system,
without any of the complications you correctly mentioned above? I have 10
million messages that I need to send from producers to consumers. I have 1
topic, 1 producer for this topic, 4 partitions for this topic and 4
consumers, one for each partition. Everything loopback, same machine, no
high-availability, transactions, etc. just KAFKA BARE BONES. What can be
more trivial and basic than that?

Cheers,

M. Queen


>
> On Thu, Jan 6, 2022 at 4:25 PM Marisa Queen <marisa.queen...@gmail.com>
> wrote:
>
> > Hi Israel,
> >
> > Your email is great, but I'm afraid to forward it to my customer because
> it
> > doesn't answer his question.
> >
> > I'm hoping that other members from this list will be able to give me a
> more
> > NUMERIC answer, let's wait to see.
> >
> > Just to give you some follow up on your answer, when you say:
> >
> > > 30 passengers per driver or aircraft per day may not sound impressive
> but
> > 750,000 passengers per day all together is how you should look at it
> >
> > Well, with this rationality one can come up with any desired throughput
> > number by just adding more partitions. Do you see my customer point that
> > this does not make any sense? Adding more partitions also does not come
> for
> > free, because messages need to be separated into the newly created
> > partition and ordering will be lost. Order is important for some
> messages,
> > so to keep adding more partitions towards an infinite throughput is not
> an
> > option.
> >
> > I've just spoken to him here, his reply was:
> >
> > "Marisa, I'm asking a very simple question for a very basic Kafka
> scenario.
> > If I can't get an answer for that, then I'm in trouble. Can you please
> find
> > out with your peers/community what is a good throughput number to have in
> > mind for the scenario I've been describing. Again it is a very basic and
> > simple scenario: I have 10 million messages that I need to send from
> > producers to consumers. Let's assume I have 1 topic, 1 producer for this
> > topic, 4 partitions for this topic and 4 consumers, one for each
> partition.
> > What I would like to know is: How long is it going to take for these 10
> > million messages to travel all the way from the producer to the
> consumers?
> > That's the throughput performance number I'm interested in."
> >
> > I surely won't tell him: "Hey, that's easy, you have 4 partitions, each
> > partition according to LinkedIn can handle 23 messages per second, so we
> > are looking for a 92 messages per second throughput here!"
> >
> > Cheers,
> >
> > M. Queen
> >
> >
> > On Thu, Jan 6, 2022 at 12:58 PM Israel Ekpo <israele...@gmail.com>
> wrote:
> >
> > > Hi Marisa
> > >
> > > I think there may be some confusion about the throughput for each
> > partition
> > > and I want to explain briefly using some analogies
> > >
> > > Using transportation for example if we were to pick an airline or
> > > ridesharing organization to describe the volume of customers they can
> > > support per day we would have to look at how many total customers can
> > > American Airlines service in a day or how many customers can Uber or
> Lyft
> > > serve in a day. We would not zero in on only the number of customers a
> > > particular driver can service or the number of passengers are
> particular
> > > aircraft than service in a day. That would be very limiting considering
> > the
> > > hundreds of thousands of aircrafts or drivers actively transporting
> > > passengers in real time.
> > >
> > > 30 passengers per driver or aircraft per day may not sound impressive
> but
> > > 750,000 passengers per day all together is how you should look at it
> > >
> > > Partitions in Kafka are just a logical unit for organizing and storing
> > data
> > > within a Kafka topic. You should not base your analysis on just what a
> > > subunit of storage is able to support.
> > >
> > > I would recommend taking a look at Kafka Summit talks on performance
> and
> > > benchmarks to get some understanding how what Kafka is able to do and
> the
> > > applicable use cases in the Financial Services industry
> > >
> > > A lot of reputable organizations already trust Kafka today for their
> > needs
> > > so this is already proven
> > >
> > > https://kafka.apache.org/powered-by
> > >
> > > I hope this helps.
> > >
> > > Israel Ekpo
> > > Lead Instructor, IzzyAcademy.com
> > > https://www.youtube.com/c/izzyacademy
> > > https://izzyacademy.com/
> > >
> > >
> > > On Thu, Jan 6, 2022 at 10:01 AM Marisa Queen <
> marisa.queen...@gmail.com>
> > > wrote:
> > >
> > > > Cheers from NYC!
> > > >
> > > > I'm trying to give a performance number to a potential client (from
> the
> > > > financial market) who asked me the following question:
> > > >
> > > > *"If I have a Kafka system setup in the best way possible for
> > > performance,
> > > > what is an approximate number that I can have in mind for the
> > throughput
> > > of
> > > > this system?"*
> > > >
> > > > The client proceeded to say:
> > > >
> > > > *"What I want to know specifically, is how many messages per second
> > can I
> > > > send from one side of my distributed system to the other side with
> > Apache
> > > > Kafka."*
> > > >
> > > > And he concluded with:
> > > >
> > > > *"To give you an example, let's say I have 10 million messages that I
> > > need
> > > > to send from producers to consumers. Let's assume I have 1 topic, 1
> > > > producer for this topic, 4 partitions for this topic and 4 consumers,
> > one
> > > > for each partition. What I would like to know is: How long is it
> going
> > to
> > > > take for these 10 million messages to travel all the way from the
> > > producer
> > > > to the consumers? That's the throughput performance number I'm
> > interested
> > > > in."*
> > > >
> > > > I read in a reddit post yesterday (for some reason I can't find the
> > post
> > > > anymore) that Kafka is able to handle 7 trillion messages per day.
> The
> > > > LinkedIn article about it, says:
> > > >
> > > >
> > > > *"We maintain over 100 Kafka clusters with more than 4,000 brokers,
> > which
> > > > serve more than 100,000 topics and 7 million partitions. The total
> > number
> > > > of messages handled by LinkedIn’s Kafka deployments recently
> surpassed
> > 7
> > > > trillion per day."*
> > > >
> > > > The OP of the reddit post went on to say that WhatsApp is handling
> > around
> > > > 64 billion messages per day (740,000 msgs per sec x 24 x 60 x 60) and
> > > that
> > > > 7
> > > > trillion for LinkedIn is a huge number, giving a whopping 81 million
> > > > messages per second for LinkedIn. But that doesn't matter for my
> > > question.
> > > >
> > > > 7 Trillion messages divided by 7 million partitions gives us 1
> million
> > > > messages per day per partition. So to calculate the throughput we do:
> > > >
> > > >     1 million divided by 60 divided by 60 divided by 24 => *23
> messages
> > > per
> > > > second per partition*
> > > >
> > > > We'll all agree that 23 messages per second per partition for
> > throughput
> > > > performance is very low, so I can't give this number to my potential
> > > > client.
> > > >
> > > > So my question is: *What number should I give to my potential
> client?*
> > > Note
> > > > that he is a stubborn and strict bank CTO, so he won't take any talk
> > from
> > > > me. He wants a mathematical answer using the scientific method.
> > > >
> > > > Has anyone been in my shoes and can shed some light on this kafka
> > > > throughput performance topic?
> > > >
> > > > Cheers,
> > > >
> > > > M. Queen
> > > >
> > >
> >
>

Reply via email to