Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-SourceSolutions in Apache Kafka Consumers

Karthick Sun, 17 Sep 2023 22:25:27 -0700

Thanks Liu Ron for the suggestion.

Can you please give any pointers/Reference for the custom partitioning
strategy, we are currently using murmur hashing with the device unique id.
It would be helpful if we guide/refer any other strategies.


Thanks and regards
Karthick.

On Mon, Sep 18, 2023 at 9:18 AM liu ron <ron9....@gmail.com> wrote:

> Hi, Karthick
>
> It looks like a data skewing problem, and I think one of the easiest and
> most efficient ways for this issue is to increase the number of Partitions
> and see how it works first, like try expanding by 100 first.
>
> Best,
> Ron
>
> Karthick <ibmkarthickma...@gmail.com> 于2023年9月17日周日 17:03写道：
>
>> Thanks Wei Chen, Giannis for the time,
>>
>>
>> For starters, you need to better size and estimate the required number of
>>> partitions you will need on the Kafka side in order to process 1000+
>>> messages/second.
>>> The number of partitions should also define the maximum parallelism for
>>> the Flink job reading for Kafka.
>>
>> Thanks for the pointer, can you please guide on what are all the factors
>> we need to consider regarding this.
>>
>> use a custom partitioner that spreads those devices to somewhat separate
>>> partitions.
>>
>> Please suggest a working solution regarding the custom partitioner, to
>> distribute the load. It will be helpful.
>>
>>
>> What we were doing at that time was to define multiple topics and each
>>> has a different # of partitions
>>
>> Thanks for the suggestion, is there any calculation for choosing topics
>> count, is there are any formulae/factors to determine this topic number,
>> please let me know if available it will be helpful for us to choose that.
>>
>> Thanks and Regards
>> Karthick.
>>
>>
>>
>> On Sun, Sep 17, 2023 at 4:04 AM Wei Chen <sagitc...@qq.com> wrote:
>>
>>> Hi Karthick,
>>> We’ve experienced the similar issue before. What we were doing at that
>>> time was to define multiple topics and each has a different # of partitions
>>> which means some of the topics with more partitions will have the high
>>> parallelisms for processing.
>>> And you can further divide the topics into several groups and each group
>>> should have the similar # of partitions. For each group, you can define as
>>> the source of flink data stream to run them in parallel with different
>>> parallelism.
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>> ------------------ Original ------------------
>>> *From:* Giannis Polyzos <ipolyzos...@gmail.com>
>>> *Date:* Sat,Sep 16,2023 11:52 PM
>>> *To:* Karthick <ibmkarthickma...@gmail.com>
>>> *Cc:* Gowtham S <gowtham.co....@gmail.com>, user <user@flink.apache.org>
>>> *Subject:* Re: Urgent: Mitigating Slow Consumer Impact and Seeking
>>> Open-SourceSolutions in Apache Kafka Consumers
>>>
>>> Can you provide some more context on what your Flink job will be doing?
>>> There might be some things you can do to fix the data skew on the link
>>> side, but first, you want to start with Kafka.
>>> For starters, you need to better size and estimate the required number
>>> of partitions you will need on the Kafka side in order to process 1000+
>>> messages/second.
>>> The number of partitions should also define the maximum parallelism for
>>> the Flink job reading for Kafka.
>>> If you know your "hot devices" in advance you might wanna use a custom
>>> partitioner that spreads those devices to somewhat separate partitions.
>>> Overall this is somewhat of a trial-and-error process. You might also
>>> want to check that these partitions are evenly balanced among your brokers
>>> and don't cause too much stress on particular brokers.
>>>
>>> Best
>>>
>>> On Sat, Sep 16, 2023 at 6:03 PM Karthick <ibmkarthickma...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gowtham i agree with you,
>>>>
>>>> I'm eager to resolve the issue or gain a better understanding. Your
>>>> assistance would be greatly appreciated.
>>>>
>>>> If there are any additional details or context needed to address my
>>>> query effectively, please let me know, and I'll be happy to provide them.
>>>>
>>>> Thank you in advance for your time and consideration. I look forward to
>>>> hearing from you and benefiting from your expertise.
>>>>
>>>> Thanks and Regards
>>>> Karthick.
>>>>
>>>> On Sat, Sep 16, 2023 at 11:04 AM Gowtham S <gowtham.co....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karthik
>>>>>
>>>>> This appears to be a common challenge related to a slow-consuming
>>>>> situation. Those with relevant experience in addressing such matters 
>>>>> should
>>>>> be capable of providing assistance.
>>>>>
>>>>> Thanks and regards,
>>>>> Gowtham S
>>>>>
>>>>>
>>>>> On Fri, 15 Sept 2023 at 23:06, Giannis Polyzos <ipolyzos...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karthick,
>>>>>>
>>>>>> on a high level seems like a data skew issue and some partitions have
>>>>>> way more data than others?
>>>>>> What is the number of your devices? how many messages are you
>>>>>> processing?
>>>>>> Most of the things you share above sound like you are looking for
>>>>>> suggestions around load distribution for Kafka.  i.e number of 
>>>>>> partitions,
>>>>>> how to distribute your device data etc.
>>>>>> It would be good to also share what your flink job is doing as I
>>>>>> don't see anything mentioned around that.. are you observing back 
>>>>>> pressure
>>>>>> in the Flink UI?
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> On Fri, Sep 15, 2023 at 3:46 PM Karthick <ibmkarthickma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear Apache Flink Community,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am writing to urgently address a critical challenge we've
>>>>>>> encountered in our IoT platform that relies on Apache Kafka and 
>>>>>>> real-time
>>>>>>> data processing. We believe this issue is of paramount importance and 
>>>>>>> may
>>>>>>> have broad implications for the community.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In our IoT ecosystem, we receive data streams from numerous devices,
>>>>>>> each uniquely identified. To maintain data integrity and ordering, we've
>>>>>>> meticulously configured a Kafka topic with ten partitions, ensuring that
>>>>>>> each device's data is directed to its respective partition based on its
>>>>>>> unique identifier. This architectural choice has proven effective in
>>>>>>> maintaining data order, but it has also unveiled a significant problem:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *One device's data processing slowness is interfering with other
>>>>>>> devices' data, causing a detrimental ripple effect throughout our 
>>>>>>> system.*
>>>>>>>
>>>>>>>
>>>>>>> To put it simply, when a single device experiences processing
>>>>>>> delays, it acts as a bottleneck within the Kafka partition, leading to
>>>>>>> delays in processing data from other devices sharing the same partition.
>>>>>>> This issue undermines the efficiency and scalability of our entire data
>>>>>>> processing pipeline.
>>>>>>>
>>>>>>> Additionally, I would like to highlight that we are currently using
>>>>>>> the default partitioner for choosing the partition of each device's 
>>>>>>> data.
>>>>>>> If there are alternative partitioning strategies that can help alleviate
>>>>>>> this problem, we are eager to explore them.
>>>>>>>
>>>>>>> We are in dire need of a high-scalability solution that not only
>>>>>>> ensures each device's data processing is independent but also prevents 
>>>>>>> any
>>>>>>> interference or collisions between devices' data streams. Our primary
>>>>>>> objectives are:
>>>>>>>
>>>>>>> 1. *Isolation and Independence:* We require a strategy that
>>>>>>> guarantees one device's processing speed does not affect other devices 
>>>>>>> in
>>>>>>> the same Kafka partition. In other words, we need a solution that 
>>>>>>> ensures
>>>>>>> the independent processing of each device's data.
>>>>>>>
>>>>>>>
>>>>>>> 2. *Open-Source Implementation:* We are actively seeking pointers
>>>>>>> to open-source implementations or references to working solutions that
>>>>>>> address this specific challenge within the Apache ecosystem or any 
>>>>>>> existing
>>>>>>> projects, libraries, or community-contributed solutions that align with 
>>>>>>> our
>>>>>>> requirements would be immensely valuable.
>>>>>>>
>>>>>>> We recognize that many Apache Flink users face similar issues and
>>>>>>> may have already found innovative ways to tackle them. We implore you to
>>>>>>> share your knowledge and experiences on this matter. Specifically, we 
>>>>>>> are
>>>>>>> interested in:
>>>>>>>
>>>>>>> *- Strategies or architectural patterns that ensure independent
>>>>>>> processing of device data.*
>>>>>>>
>>>>>>> *- Insights into load balancing, scalability, and efficient data
>>>>>>> processing across Kafka partitions.*
>>>>>>>
>>>>>>> *- Any existing open-source projects or implementations that address
>>>>>>> similar challenges.*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We are confident that your contributions will not only help us
>>>>>>> resolve this critical issue but also assist the broader Apache Flink
>>>>>>> community facing similar obstacles.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Please respond to this thread with your expertise, solutions, or any
>>>>>>> relevant resources. Your support will be invaluable to our team and the
>>>>>>> entire Apache Flink community.
>>>>>>>
>>>>>>> Thank you for your prompt attention to this matter.
>>>>>>>
>>>>>>>
>>>>>>> Thanks & Regards
>>>>>>>
>>>>>>> Karthick.
>>>>>>>
>>>>>>

Re: Urgent: Mitigating Slow Consumer Impact and Seeking Open-SourceSolutions in Apache Kafka Consumers

Reply via email to