Hm, it's an optimization for "first layer", so if the bottleneck is in
"second layer" (i.e. DB write) as you mentioned, it shouldn't make much
difference I think.
2020年12月22日(火) 16:02 Yana K :
> I thought about it but then we don't have much time - will it optimize
> performance?
>
> On Mon, Dec
I thought about it but then we don't have much time - will it optimize
performance?
On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada wrote:
> About "first layer" right?
> Then it's better to make sure that not get() the result of Producer#send()
> for each message, because in that way, it spoils the
About "first layer" right?
Then it's better to make sure that not get() the result of Producer#send()
for each message, because in that way, it spoils the ability of
producer-batching.
Kafka producer batches messages by default and it's very efficient, so if
you produce in async way, it rarely beco
Thanks!
Also are there any producer optimizations anyone can think of in this
scenario?
On Mon, Dec 21, 2020 at 8:58 AM Joris Peeters
wrote:
> I'd probably just do it by experiment for your concrete data.
>
> Maybe generate a few million synthetic data rows, and for-each-batch insert
> them i
I'd probably just do it by experiment for your concrete data.
Maybe generate a few million synthetic data rows, and for-each-batch insert
them into a dev DB, with an outer grid search over various candidate batch
sizes. You're looking to optimise for flat-out rows/s, so whichever batch
size wins (
Thanks Haruki and Joris.
Haruki:
Thanks for the detailed calculations. Really appreciate it. What tool/lib
is used to load test kafka?
So we've one consumer group and running 7 instances of the application -
that should be good enough - correct?
Joris:
Great point.
DB insert is a bottleneck (and
Do you know why your consumers are so slow? 12E6msg/hour is msg/s,
which is not very high from a Kafka point-of-view. As you're doing database
inserts, I suspect that is where the bottleneck lies.
If, for example, you're doing a single-row insert in a SQL DB for every
message then this would i
About load test:
I think it'd be better to monitor per-message process latency and estimate
required partition count based on it because it determines the max
throughput per single partition.
- Say you have to process 12 million messages/hour = messages/sec .
- If you have 7 partitions (thus 7
So as the next step I see to increase the partition of the 2nd topic - do I
increase the instances of the consumer from that or keep it at 7?
Anything else (besides researching those libs)?
Are there any good tools for load testing kafka?
On Sun, Dec 20, 2020 at 7:23 PM Haruki Okada wrote:
> It
It depends on how you manually commit offsets.
Auto-commit does commits offsets in async manner basically, so as long as
you do manual-commit in the same way, there should be no much difference.
And, generally offset-commit mode doesn't make much difference in
performance regardless manual/auto o
Thank you so much Marina and Haruka.
Marina's response:
- When you say " if you are sure there is no room for perf optimization of
the processing itself :" - do you mean code level optimizations? Can you
please explain?
- On the second topic you say " I'd say at least 40" - is this based on 12
mil
Hi.
Yeah, Spring-Kafka does processing messages sequentially, so the consumer
throughput would be capped by database latency per single process.
One possible solution is creating an intermediate topic (or altering source
topic) with much more partitions as Marina suggested.
I'd like to suggest an
The way I see it - you can only do a few things - if you are sure there is no
room for perf optimization of the processing itself :
1. speed up your processing per consumer thread: which you already tried by
splitting your logic into a 2-step pipeline instead of 1-step, and delegating
the work o
Hi Ramz,
A good rule of thumb has been no more than 4,000 partitions per broker and no
more than 100,000 in a cluster.
This includes all replicas and it's related more to Kafka internals then it is
resource usage so I strongly advise not pushing these limits.
Otherwise, the usual reasons for sc
14 matches
Mail list logo