Hi Kamaal,
I did a quick test with a local Kafka in docker. With parallelism 1, I can
process 20k messages of size 4KB in about 1 min. So if you use parallelism
of 15, I'd expect it to take it below 10s even with bigger data skew.
What I recommend you to do is to start from scratch and just work
Hi Robert,
I have removed all the business logic (keyBy and window) operator code and just
had a source and sink to test it.
The throughput is 20K messages in 2 minutes. It is a simple read from source
(kafka topic) and write to sink (kafka topic). Don't you think 2 minutes is
also not a better
Hi Kamaal,
I would first suggest understanding the performance bottleneck, before
applying any optimizations.
Idea 1: Are your CPUs fully utilized?
if yes, good, then scaling up will probably help
If not, then there's another inefficiency
Idea 2: How fast can you get the data into your job, with
Hi Arvid,
The throughput has decreased further after I removed all the rebalance(). The
performance has decreased from 14 minutes for 20K messages to 20 minutes for
20K messages.
Below are the tasks that the flink application is performing. I am using keyBy
and Window operation. Do you think a
Hi Mohammed,
something is definitely wrong in your setup. You can safely say that you
can process 1k records per second and core with Kafka and light processing,
so you shouldn't even need to go distributed in your case.
Do you perform any heavy computation? What is your flatMap doing? Are you
em
Hi Fabian,
Just an update,
Problem 2:-
Caused by: org.apache.kafka.common.errors.NetworkException
It is resolved. It was because we exceeded the number of allowed
partitions for the kafka cluster (AWS MSK cluster). Have deleted
unused topics and partitions to resolve the issue.
Hi Mohammed,
200records should definitely be doable. The first you can do is remove the
print out Sink because they are increasing the load on your cluster due to the
additional IO
operation and secondly preventing Flink from fusing operators.
I am interested to see the updated job graph after
Hi Mohammed,
Without diving too much into your business logic a thing which catches my eye
is the partitiong you are using. In general all
calls to`keyBy`or `rebalance` are very expensive because all the data is
shuffled across down- stream tasks. Flink tries to
fuse operators with the same keyG
Hi,
Apologize for the big message, to explain the issue in detail.
We have a Flink (version 1.8) application running on AWS Kinesis Analytics. The
application has a source which is a kafka topic with 15 partitions (AWS Managed
Streaming Kafka) and the sink is again a kafka topic with 15 partiti