Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-SourceSolutions in Apache Kafka Consumers

Wei Chen Sat, 16 Sep 2023 15:35:03 -0700

Hi Karthick,
We’ve experienced the similar issue before. What we were doing at that time was 
to define multiple topics and each has a different # of partitions which means 
some of the topics with more partitions will have the high parallelisms for 
processing.
And you can further divide the topics into several groups and each group should 
have the similar # of partitions. For each group, you can define as the source 
of flink data stream to run them in parallel with different parallelism.














------------------ Original ------------------
From: Giannis Polyzos <ipolyzos...@gmail.com&gt;
Date: Sat,Sep 16,2023 11:52 PM
To: Karthick <ibmkarthickma...@gmail.com&gt;
Cc: Gowtham S <gowtham.co....@gmail.com&gt;, user <user@flink.apache.org&gt;
Subject: Re: Urgent: Mitigating Slow Consumer Impact and Seeking 
Open-SourceSolutions in Apache Kafka Consumers



Can you provide some more context on what your Flink job will be doing? There 
might be some things you can do to fix the data skew on the link side, but 
first, you want to start with Kafka.For starters, you need to better size and 
estimate the required number of partitions you will need on the Kafka side in 
order to process 1000+ messages/second.&nbsp;
The number of partitions should also define the maximum parallelism for the 
Flink job reading for Kafka.
If you know your "hot devices" in advance you might wanna use a custom 
partitioner that spreads those devices to somewhat separate partitions.&nbsp;
Overall this is somewhat of a trial-and-error process. You might also want to 
check that these partitions are evenly balanced among your brokers and don't 
cause too much stress on particular brokers.


Best


On Sat, Sep 16, 2023 at 6:03 PM Karthick <ibmkarthickma...@gmail.com&gt; wrote:

Hi Gowtham i agree with you,

I'm eager to resolve the issue or gain a better understanding. Your assistance 
would be greatly appreciated.

If there are any additional details or context needed to address my query 
effectively, please let me know, and I'll be happy to provide them.

Thank you in advance for your time and consideration. I look forward to hearing 
from you and benefiting from your expertise.


Thanks and Regards
Karthick.


On Sat, Sep 16, 2023 at 11:04 AM Gowtham S <gowtham.co....@gmail.com&gt; wrote:

Hi Karthik

This appears to be a common challenge related to a slow-consuming situation. 
Those with relevant experience in addressing such matters should be capable of 
providing assistance.


Thanks and regards,Gowtham S












On Fri, 15 Sept 2023 at 23:06, Giannis Polyzos <ipolyzos...@gmail.com&gt; wrote:

Hi Karthick,

on a high level seems like a data skew issue and some partitions have way more 
data than others?&nbsp;
What is the number of your devices? how many messages are you processing?&nbsp;
Most of the things you share above sound like you are looking for suggestions 
around load distribution for Kafka.&nbsp; i.e number of partitions, how to 
distribute your device data etc.
It would be good to also share what your flink job is doing as I don't see 
anything mentioned around&nbsp;that.. are you observing back pressure in the 
Flink UI?


Best


On Fri, Sep 15, 2023 at 3:46 PM Karthick <ibmkarthickma...@gmail.com&gt; wrote:


Dear Apache Flink Community,

&nbsp;

I am writing to urgently address a critical challenge we've encountered in our 
IoT platform that relies on Apache Kafka and real-time data processing. We 
believe this issue is of paramount importance and may have broad implications 
for the community.

&nbsp;

In our IoT ecosystem, we receive data streams from numerous devices, each 
uniquely identified. To maintain data integrity and ordering, we've 
meticulously configured a Kafka topic with ten partitions, ensuring that each 
device's data is directed to its respective partition based on its unique 
identifier. This architectural choice has proven effective in maintaining data 
order, but it has also unveiled a significant problem:

&nbsp;

One device's data processing slowness is interfering with other devices' data, 
causing a detrimental ripple effect throughout our system.&nbsp;

To put it simply, when a single device experiences processing delays, it acts 
as a bottleneck within the Kafka partition, leading to delays in processing 
data from other devices sharing the same partition. This issue undermines the 
efficiency and scalability of our entire data processing pipeline.&nbsp;

Additionally, I would like to highlight that we are currently using the default 
partitioner for choosing the partition of each device's data. If there are 
alternative partitioning strategies that can help alleviate this problem, we 
are eager to explore them.

We are in dire need of a high-scalability solution that not only ensures each 
device's data processing is independent but also prevents any interference or 
collisions between devices' data streams. Our primary objectives are:&nbsp;

1. Isolation and Independence: We require a strategy that guarantees one 
device's processing speed does not affect other devices in the same Kafka 
partition. In other words, we need a solution that ensures the independent 
processing of each device's data.&nbsp;




2. Open-Source Implementation: We are actively seeking pointers to open-source 
implementations or references to working solutions that address this specific 
challenge within the Apache ecosystem or any existing projects, libraries, or 
community-contributed solutions that align with our requirements would be 
immensely valuable.

We recognize that many Apache Flink users face similar issues and may have 
already found innovative ways to tackle them. We implore you to share your 
knowledge and experiences on this matter. Specifically, we are interested in:


- Strategies or architectural patterns that ensure independent processing of 
device data.

- Insights into load balancing, scalability, and efficient data processing 
across Kafka partitions.

- Any existing open-source projects or implementations that address similar 
challenges.

&nbsp;

We are confident that your contributions will not only help us resolve this 
critical issue but also assist the broader Apache Flink community facing 
similar obstacles.

&nbsp;

Please respond to this thread with your expertise, solutions, or any relevant 
resources. Your support will be invaluable to our team and the entire Apache 
Flink community.

Thank you for your prompt attention to this matter.




Thanks &amp; Regards

Karthick.

Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-SourceSolutions in Apache Kafka Consumers

Reply via email to