Re: Kafka performance when it comes to throughput

Israel Ekpo Thu, 06 Jan 2022 15:53:14 -0800

Marisa,

I do not agree with your assessment. There are several factors that could
influence your performance numbers even with localhost. Your project should
be configured based on your own needs.


Your throughput could go up or lower depending on how you are configured
based on what is important for your use case(s).

If you have other apps running on the machine that would impact your
results. If you only have a 2 CPU, 4GB laptop, obviously you cannot compare
the results with a server that has 256GB of RAM and 64 Cores.

Also, do not measure it in terms of messages per second but more in terms
of data volume per second. A throughput of 100GBps will give you 100
messages per second 1 GB per message or 100,000 messages per second at 1KB
each if you have smaller messages the same volume will give a higher count
of messages for the same unit time.

Take a look at the reference architecture and this best practices document
for how to optimize your performance based on your project goals
(durability, latency, throughput and availability)

Confluent Platform Reference Architecture - Confluent
<https://www.confluent.io/thank-you/resources/apache-kafka-confluent-enterprise-reference-architecture/>
Kafka Best Practices: Build, Monitor & Optimize Kafka in Confluent Cloud
<https://www.confluent.io/thank-you/resources/recommendations-developers-using-confluent-cloud/>

Everybody's scenario and use case will impact how they set up their
project. You cannot look at another project and use their numbers for your
own set up. That is generally a bad idea and the better answer is that you
will need to define your project objectives and then figure out what is
needed to achieve those goals.

The better question is to take a look at what volume throughput, retention
policy and period as well as environment and then figure out the capacity
planning necessary to support what you need.

You can achieve any performance benchmark you are willing to pay for. I am
not a fan of just blinding copying other peoples numbers and using it out
of context in benchmarks comparisons.

Take a look at the capacity planner and sizing calculator to figure out
what hardware and infrastructure you need for your scenario

Sizing Calculator for Apache Kafka and Confluent Platform (eventsizer.io)
<https://eventsizer.io/>

I hope this is more useful.


Israel Ekpo
Lead Instructor, IzzyAcademy.com
https://www.youtube.com/c/izzyacademy
https://izzyacademy.com/


On Thu, Jan 6, 2022 at 6:07 PM Marisa Queen <marisa.queen...@gmail.com>
wrote:

> Hi Joris,
>
> Thank you so much, friend!
>
> > I appreciate that setting up everything on localhost will be easier and
> lead to big numbers, but bear in mind that it's typically all the other
> real-life stuff (remote connections, replication, at-least once, ...) that
> causes massive slowdowns compared to localhost
>
> Totally agree! But we must establish a ceiling first. If this
> super-good-loopback number doesn't look good, then one has no business
> moving forward with Kafka to the more complex (and of course slower) stuff.
>
> The purpose of the ceiling is that. It is your maximum ambition represented
> by a number. You can't go any higher than that. At least with Kafka.
>
> Agree?
>
> Cheers,
>
> M. Queen
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Jan 6, 2022 at 3:51 PM Joris Peeters <joris.mg.peet...@gmail.com>
> wrote:
>
> > These tutorials - though quite a bit outdated - seem quite useful:
> > http://cloudurable.com/blog/kafka-tutorial-kafka-producer/index.html
> (and
> > the follow-ups).
> > Ends up being close to how I write this in Java, and tutorial 13 talks
> > about batching and acks etc, which you'll need in order to tune to
> maximise
> > your throughput.
> >
> > I'm sure someone else has better example resources.
> >
> >
> >
> > On Thu, Jan 6, 2022 at 6:25 PM Marisa Queen <marisa.queen...@gmail.com>
> > wrote:
> >
> > > Hi Joris,
> > >
> > > Thank you so much. I plan to write a Java Consumer and a Java Producer,
> > for
> > > my benchmark. Do you recommend an example that I can use as a reference
> > to
> > > write my basic Java producer and simple Java consumer? I'll for sure
> > share
> > > the through number I get with the community. Maybe even write a blog
> post
> > > about it. I hope it is more than 23 messages per second per partition
> > > :PPPPP
> > >
> > > Cheers,
> > >
> > > M. Queen
> > >
> > >
> > > On Thu, Jan 6, 2022 at 2:14 PM Joris Peeters <
> joris.mg.peet...@gmail.com
> > >
> > > wrote:
> > >
> > > > I'd just follow the instructions in
> > https://kafka.apache.org/quickstart
> > > to
> > > > set up Kafka and Zookeeper on a single node, by running the Java
> > > processes
> > > > directly. Or can run in Docker.
> > > >
> > > > For the producer and consumer I'd personally use Python, as it's the
> > > > easiest to get going. You may want to look at
> > > > https://kafka-python.readthedocs.io/en/master/# (easier) and
> > > > https://github.com/confluentinc/confluent-kafka-python (faster).
> > Similar
> > > > things exist for Go, Java, C++, ...
> > > > Or I'm sure there are some benchmark setups out there that you can
> > tweak
> > > a
> > > > little.
> > > >
> > > > I appreciate that setting up everything on localhost will be easier
> and
> > > > lead to big numbers, but bear in mind that it's typically all the
> other
> > > > real-life stuff (remote connections, replication, at-least-once, ...)
> > > that
> > > > causes massive slowdowns compared to localhost, and those are things
> > > banks
> > > > eventually tend to need (I work in finance industry myself). What
> > you're
> > > > doing is a very useful benchmark, but I'd surround it with the above
> > > > caveats to avoid overpromising.
> > > >
> > > > -J
> > > >
> > > >
> > > > On Thu, Jan 6, 2022 at 4:58 PM Marisa Queen <
> marisa.queen...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Joris,
> > > > >
> > > > > I've spoken to him. His answers are below:
> > > > >
> > > > >
> > > > > On Thu, Jan 6, 2022 at 1:37 PM Joris Peeters <
> > > joris.mg.peet...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There's a few unknown parameters here that might influence the
> > > answer,
> > > > > > though. From the top of my head, at least
> > > > > > - How much replication of the data is needed (for high
> > availability),
> > > > and
> > > > > > how many acks for the producer? (If fire-and-forget it can be
> > faster,
> > > > if
> > > > > > need to replicate and ack from 3 brokers in different DC's then
> > will
> > > be
> > > > > > slower)
> > > > > >
> > > > >
> > > > > Let's assume no high-availability for now, for simplicity's sake.
> > > > > Fire-and-forget like he said. We don't want to overcomplicate this
> > > simple
> > > > > benchmark and we want the highest possible throughput number.
> > > > >
> > > > >
> > > > > > - Transactions? (If end-to-end exactly-once then it's a lot
> slower)
> > > > > >
> > > > >
> > > > > Again no transactions. Let's keep it simple.
> > > > >
> > > > >
> > > > > > - Size of the messages? (If each message is a GB it will
> obviously
> > be
> > > > > > slower)
> > > > > >
> > > > >
> > > > > Let's assume 512 bytes. Powers of two are fun!
> > > > >
> > > > >
> > > > > > - Distance and bandwidth between the producers, Kafka & the
> > > consumers?
> > > > > (If
> > > > > > the network links get saturated that would limit the performance.
> > > > Latency
> > > > > > is likely less important than throughput, but if your consumers
> are
> > > in
> > > > > > Tokyo and the producer in London then it will likely also be
> > slower)
> > > > > >
> > > > >
> > > > >
> > > > > Loopback, same machine, for the love of God. Let's not even go
> there.
> > > We
> > > > > want the highest possible throughput. I accept the limit of the
> speed
> > > of
> > > > > light. If network particularities, and distances, are to be
> included
> > in
> > > > > this measurement then it is basically worth nothing. Loopback
> > > eliminates
> > > > > all those network variables that we surely don't want to include in
> > the
> > > > > benchmark.
> > > > >
> > > > >
> > > > > >
> > > > > > FWIW, I find that the producer side is generally the limiting
> > factor,
> > > > > > especially if there is only one.
> > > > > > I'd take a look at e.g. the Appendix test details on
> > > > > >
> > > >
> > https://docs.confluent.io/2.0.0/clients/librdkafka/INTRODUCTION_8md.html
> > > > > .
> > > > > > I
> > > > > > haven't yet seen a faster Kafka impl than rdkafka, so those would
> > be
> > > > > > reasonable upper bounds.
> > > > > >
> > > > >
> > > > >
> > > > > Thanks for your reply, Joris. Can you point me to a Hello World
> Kafka
> > > > > example, so I can set up this very basic and BARE BONES Kafka
> system,
> > > > > without any of the complications you correctly mentioned above? I
> > have
> > > 10
> > > > > million messages that I need to send from producers to consumers. I
> > > have
> > > > 1
> > > > > topic, 1 producer for this topic, 4 partitions for this topic and 4
> > > > > consumers, one for each partition. Everything loopback, same
> machine,
> > > no
> > > > > high-availability, transactions, etc. just KAFKA BARE BONES. What
> can
> > > be
> > > > > more trivial and basic than that?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > M. Queen
> > > > >
> > > > >
> > > > > >
> > > > > > On Thu, Jan 6, 2022 at 4:25 PM Marisa Queen <
> > > marisa.queen...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Israel,
> > > > > > >
> > > > > > > Your email is great, but I'm afraid to forward it to my
> customer
> > > > > because
> > > > > > it
> > > > > > > doesn't answer his question.
> > > > > > >
> > > > > > > I'm hoping that other members from this list will be able to
> give
> > > me
> > > > a
> > > > > > more
> > > > > > > NUMERIC answer, let's wait to see.
> > > > > > >
> > > > > > > Just to give you some follow up on your answer, when you say:
> > > > > > >
> > > > > > > > 30 passengers per driver or aircraft per day may not sound
> > > > impressive
> > > > > > but
> > > > > > > 750,000 passengers per day all together is how you should look
> at
> > > it
> > > > > > >
> > > > > > > Well, with this rationality one can come up with any desired
> > > > throughput
> > > > > > > number by just adding more partitions. Do you see my customer
> > point
> > > > > that
> > > > > > > this does not make any sense? Adding more partitions also does
> > not
> > > > come
> > > > > > for
> > > > > > > free, because messages need to be separated into the newly
> > created
> > > > > > > partition and ordering will be lost. Order is important for
> some
> > > > > > messages,
> > > > > > > so to keep adding more partitions towards an infinite
> throughput
> > is
> > > > not
> > > > > > an
> > > > > > > option.
> > > > > > >
> > > > > > > I've just spoken to him here, his reply was:
> > > > > > >
> > > > > > > "Marisa, I'm asking a very simple question for a very basic
> Kafka
> > > > > > scenario.
> > > > > > > If I can't get an answer for that, then I'm in trouble. Can you
> > > > please
> > > > > > find
> > > > > > > out with your peers/community what is a good throughput number
> to
> > > > have
> > > > > in
> > > > > > > mind for the scenario I've been describing. Again it is a very
> > > basic
> > > > > and
> > > > > > > simple scenario: I have 10 million messages that I need to send
> > > from
> > > > > > > producers to consumers. Let's assume I have 1 topic, 1 producer
> > for
> > > > > this
> > > > > > > topic, 4 partitions for this topic and 4 consumers, one for
> each
> > > > > > partition.
> > > > > > > What I would like to know is: How long is it going to take for
> > > these
> > > > 10
> > > > > > > million messages to travel all the way from the producer to the
> > > > > > consumers?
> > > > > > > That's the throughput performance number I'm interested in."
> > > > > > >
> > > > > > > I surely won't tell him: "Hey, that's easy, you have 4
> > partitions,
> > > > each
> > > > > > > partition according to LinkedIn can handle 23 messages per
> > second,
> > > so
> > > > > we
> > > > > > > are looking for a 92 messages per second throughput here!"
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > M. Queen
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jan 6, 2022 at 12:58 PM Israel Ekpo <
> > israele...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Marisa
> > > > > > > >
> > > > > > > > I think there may be some confusion about the throughput for
> > each
> > > > > > > partition
> > > > > > > > and I want to explain briefly using some analogies
> > > > > > > >
> > > > > > > > Using transportation for example if we were to pick an
> airline
> > or
> > > > > > > > ridesharing organization to describe the volume of customers
> > they
> > > > can
> > > > > > > > support per day we would have to look at how many total
> > customers
> > > > can
> > > > > > > > American Airlines service in a day or how many customers can
> > Uber
> > > > or
> > > > > > Lyft
> > > > > > > > serve in a day. We would not zero in on only the number of
> > > > customers
> > > > > a
> > > > > > > > particular driver can service or the number of passengers are
> > > > > > particular
> > > > > > > > aircraft than service in a day. That would be very limiting
> > > > > considering
> > > > > > > the
> > > > > > > > hundreds of thousands of aircrafts or drivers actively
> > > transporting
> > > > > > > > passengers in real time.
> > > > > > > >
> > > > > > > > 30 passengers per driver or aircraft per day may not sound
> > > > impressive
> > > > > > but
> > > > > > > > 750,000 passengers per day all together is how you should
> look
> > at
> > > > it
> > > > > > > >
> > > > > > > > Partitions in Kafka are just a logical unit for organizing
> and
> > > > > storing
> > > > > > > data
> > > > > > > > within a Kafka topic. You should not base your analysis on
> just
> > > > what
> > > > > a
> > > > > > > > subunit of storage is able to support.
> > > > > > > >
> > > > > > > > I would recommend taking a look at Kafka Summit talks on
> > > > performance
> > > > > > and
> > > > > > > > benchmarks to get some understanding how what Kafka is able
> to
> > do
> > > > and
> > > > > > the
> > > > > > > > applicable use cases in the Financial Services industry
> > > > > > > >
> > > > > > > > A lot of reputable organizations already trust Kafka today
> for
> > > > their
> > > > > > > needs
> > > > > > > > so this is already proven
> > > > > > > >
> > > > > > > > https://kafka.apache.org/powered-by
> > > > > > > >
> > > > > > > > I hope this helps.
> > > > > > > >
> > > > > > > > Israel Ekpo
> > > > > > > > Lead Instructor, IzzyAcademy.com
> > > > > > > > https://www.youtube.com/c/izzyacademy
> > > > > > > > https://izzyacademy.com/
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jan 6, 2022 at 10:01 AM Marisa Queen <
> > > > > > marisa.queen...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Cheers from NYC!
> > > > > > > > >
> > > > > > > > > I'm trying to give a performance number to a potential
> client
> > > > (from
> > > > > > the
> > > > > > > > > financial market) who asked me the following question:
> > > > > > > > >
> > > > > > > > > *"If I have a Kafka system setup in the best way possible
> for
> > > > > > > > performance,
> > > > > > > > > what is an approximate number that I can have in mind for
> the
> > > > > > > throughput
> > > > > > > > of
> > > > > > > > > this system?"*
> > > > > > > > >
> > > > > > > > > The client proceeded to say:
> > > > > > > > >
> > > > > > > > > *"What I want to know specifically, is how many messages
> per
> > > > second
> > > > > > > can I
> > > > > > > > > send from one side of my distributed system to the other
> side
> > > > with
> > > > > > > Apache
> > > > > > > > > Kafka."*
> > > > > > > > >
> > > > > > > > > And he concluded with:
> > > > > > > > >
> > > > > > > > > *"To give you an example, let's say I have 10 million
> > messages
> > > > > that I
> > > > > > > > need
> > > > > > > > > to send from producers to consumers. Let's assume I have 1
> > > > topic, 1
> > > > > > > > > producer for this topic, 4 partitions for this topic and 4
> > > > > consumers,
> > > > > > > one
> > > > > > > > > for each partition. What I would like to know is: How long
> is
> > > it
> > > > > > going
> > > > > > > to
> > > > > > > > > take for these 10 million messages to travel all the way
> from
> > > the
> > > > > > > > producer
> > > > > > > > > to the consumers? That's the throughput performance number
> > I'm
> > > > > > > interested
> > > > > > > > > in."*
> > > > > > > > >
> > > > > > > > > I read in a reddit post yesterday (for some reason I can't
> > find
> > > > the
> > > > > > > post
> > > > > > > > > anymore) that Kafka is able to handle 7 trillion messages
> per
> > > > day.
> > > > > > The
> > > > > > > > > LinkedIn article about it, says:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > *"We maintain over 100 Kafka clusters with more than 4,000
> > > > brokers,
> > > > > > > which
> > > > > > > > > serve more than 100,000 topics and 7 million partitions.
> The
> > > > total
> > > > > > > number
> > > > > > > > > of messages handled by LinkedIn’s Kafka deployments
> recently
> > > > > > surpassed
> > > > > > > 7
> > > > > > > > > trillion per day."*
> > > > > > > > >
> > > > > > > > > The OP of the reddit post went on to say that WhatsApp is
> > > > handling
> > > > > > > around
> > > > > > > > > 64 billion messages per day (740,000 msgs per sec x 24 x
> 60 x
> > > 60)
> > > > > and
> > > > > > > > that
> > > > > > > > > 7
> > > > > > > > > trillion for LinkedIn is a huge number, giving a whopping
> 81
> > > > > million
> > > > > > > > > messages per second for LinkedIn. But that doesn't matter
> for
> > > my
> > > > > > > > question.
> > > > > > > > >
> > > > > > > > > 7 Trillion messages divided by 7 million partitions gives
> us
> > 1
> > > > > > million
> > > > > > > > > messages per day per partition. So to calculate the
> > throughput
> > > we
> > > > > do:
> > > > > > > > >
> > > > > > > > >     1 million divided by 60 divided by 60 divided by 24 =>
> > *23
> > > > > > messages
> > > > > > > > per
> > > > > > > > > second per partition*
> > > > > > > > >
> > > > > > > > > We'll all agree that 23 messages per second per partition
> for
> > > > > > > throughput
> > > > > > > > > performance is very low, so I can't give this number to my
> > > > > potential
> > > > > > > > > client.
> > > > > > > > >
> > > > > > > > > So my question is: *What number should I give to my
> potential
> > > > > > client?*
> > > > > > > > Note
> > > > > > > > > that he is a stubborn and strict bank CTO, so he won't take
> > any
> > > > > talk
> > > > > > > from
> > > > > > > > > me. He wants a mathematical answer using the scientific
> > method.
> > > > > > > > >
> > > > > > > > > Has anyone been in my shoes and can shed some light on this
> > > kafka
> > > > > > > > > throughput performance topic?
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > >
> > > > > > > > > M. Queen
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kafka performance when it comes to throughput

Reply via email to