Hi Luke,
> The solution I can think of is to create only one partition for the topic.
That would work, but then I lose the benefits of the partitions.
> Or you can create 4 consumers in one group, to consume from 4 partitions.
That works, too.
That does not work, because I need only one consume
Hi Roger,
I am going to briefly add to what others have already stated.
The recommendations made by Sunil and Luke are based on the fundamentals of
how Kafka stores and organizes events as well as the retrieval mechanism of
consumer groups.
Without additional details about the objectives of your
Cheers from NYC!
I'm trying to give a performance number to a potential client (from the
financial market) who asked me the following question:
*"If I have a Kafka system setup in the best way possible for performance,
what is an approximate number that I can have in mind for the throughput of
th
hi Roger,
What consumer u r using?
Is there a chance to mention consumer threads?
Example: logstash kafka consumer has configurable number of threads under
each consumer instance. That may help up to some extent.
Regards,
Sunil.
On Thu, 6 Jan 2022 at 7:27 PM, Roger Kasinsky
wrote:
> Hi Luke,
>
Hi Marisa
I think there may be some confusion about the throughput for each partition
and I want to explain briefly using some analogies
Using transportation for example if we were to pick an airline or
ridesharing organization to describe the volume of customers they can
support per day we would
Hi Israel,
Thanks for your detailed explanation. I understand now that Kafka can't
give me any guarantees with regards to ordering if my single consumer is
consuming from multiple partitions.
Hi Sunil,
Thanks for the thread suggestion. However I don't think increasing or
decreasing the number of
Hi, Marisa.
Kafka is well-designed to make full use of system resources, so I think
calculating based on machine's spec is a good start.
Let's say we have servers with 10Gbps full-duplex NIC.
Also, let's say we set the topic's replication factor to 3 (so the cluster
will have minimum 3 servers),
Hi Israel,
Your email is great, but I'm afraid to forward it to my customer because it
doesn't answer his question.
I'm hoping that other members from this list will be able to give me a more
NUMERIC answer, let's wait to see.
Just to give you some follow up on your answer, when you say:
> 30 p
There's a few unknown parameters here that might influence the answer,
though. From the top of my head, at least
- How much replication of the data is needed (for high availability), and
how many acks for the producer? (If fire-and-forget it can be faster, if
need to replicate and ack from 3 broker
Hi Okada,
Thanks for your reply. Finally I see some numbers! I love numbers :)
I've shown your email to my boss (I hope he will hire me to do this
project) and he said the following:
"I would like to see this 833k/sec number for myself. Am I asking too much?
:) Can you set up a very basic and si
Hi Joris,
I've spoken to him. His answers are below:
On Thu, Jan 6, 2022 at 1:37 PM Joris Peeters
wrote:
> There's a few unknown parameters here that might influence the answer,
> though. From the top of my head, at least
> - How much replication of the data is needed (for high availability),
I'd just follow the instructions in https://kafka.apache.org/quickstart to
set up Kafka and Zookeeper on a single node, by running the Java processes
directly. Or can run in Docker.
For the producer and consumer I'd personally use Python, as it's the
easiest to get going. You may want to look at
h
Hi Joris,
Thank you so much. I plan to write a Java Consumer and a Java Producer, for
my benchmark. Do you recommend an example that I can use as a reference to
write my basic Java producer and simple Java consumer? I'll for sure share
the through number I get with the community. Maybe even write
These tutorials - though quite a bit outdated - seem quite useful:
http://cloudurable.com/blog/kafka-tutorial-kafka-producer/index.html (and
the follow-ups).
Ends up being close to how I write this in Java, and tutorial 13 talks
about batching and acks etc, which you'll need in order to tune to max
Hi Joris,
Thank you so much, friend!
> I appreciate that setting up everything on localhost will be easier and
lead to big numbers, but bear in mind that it's typically all the other
real-life stuff (remote connections, replication, at-least once, ...) that
causes massive slowdowns compared to lo
Marisa,
I do not agree with your assessment. There are several factors that could
influence your performance numbers even with localhost. Your project should
be configured based on your own needs.
Your throughput could go up or lower depending on how you are configured
based on what is important
Hi Israel,
> You can achieve any performance benchmark you are willing to pay for.
Thanks for your email. Allow me to respectfully disagree. I believe that
some systems are better than others when it comes to performance. The idea
that I can just take a slow system, multiply by 1 million, and the
Thanks for your response Marisa.
This has been a very interesting discussion and I appreciate it.
It is a bit of a challenge in the sense that I wish I had a demo ready to
go with similar use case and expectations to easily explain what I have
been trying to convey
I am always ready for a chall
Marisa, you might consider engaging someone at Confluent, maybe they can
give you some case studies or whitepapers from similar use-cases in the
financial industry. (and yes, Kafka is used in the financial industry) . A
client asking you to "prove that Kafka performs/scales" seems like an
unusual
Wow, that's awesome! I wasn't expecting that. I truly appreciate your help
and professionalism.
> Let me find some time soon and I will do a video on that scenario
optimized primarily for low latency and throughput. I will also compare how
this performs when adjusted for durability and high availa
Hi Alex,
> Furthermore, setting up a localhost pub/sub demo on a single machine
(your laptop?) is so far removed from a real-world scenario I can't imagine
how any numbers derived from that would be useful.
I can't imagine either. That's why I'm planning to run this on a lab Linux
machine with 8
21 matches
Mail list logo