Hi

Maybe you need to dump memory and analyze the usage if there are no other
obvious problems

Best,
Shammon

On Fri, Feb 17, 2023 at 10:41 AM Weihua Hu <huweihua....@gmail.com> wrote:

> Hi, Meghajit
>
> What kind of session cluster are you using? Standalone or Native?
> If it's standalone, maybe you can check if TaskManager with heavy gc is
> running more tasks than others. If so, we can enable
> "cluster.evenly-spread-out-slots=true" to balance tasks in all task
> managers.
>
> Best,
> Weihua
>
>
> On Thu, Feb 16, 2023 at 10:52 PM Meghajit Mazumdar <
> meghajit.mazum...@gojek.com> wrote:
>
>> Hello,
>>
>> We have a Flink session cluster deployment in Kubernetes of around 100
>> TaskManagers. It processes around 20-30 Kafka Source jobs at the moment.
>> The jobs run are all using the same jar and only differ in the SQL query
>> used and other UDFs. We are using the official flink:1.14.3 image.
>>
>> We observed that one specific task manager has been doing more garbage
>> collection compared to the others, So much actually, that at a specific
>> hour of the day, it pauses execution to do GC and thus causes huge consumer
>> lag to build up. By garbage collection, I mean GC of the Young Generation.
>> The old generation GC looks fine.
>>
>> We checked this in our other running Flink clusters and found that
>> actually in most of them, this behaviour is being seen. In fact, there are
>> always 2-3 TaskManagers which seem to be doing more GC than the others.
>>
>> Is this a known issue ? Our clusters run long running kafka source to
>> kafka sink jobs, so wanted to know if this can happen because of  that.
>>
>> Would appreciate any kind of guidance.
>> --
>> *Regards,*
>> *Meghajit*
>>
>

Reply via email to