Re: Flink CDC job getting failed due to G1 old gc

Leonard Xu Sun, 01 Aug 2021 19:18:35 -0700

Hi, Ayush


Thanks for the detailed description.

Before analyze the issue, I have two questions that which Flink and Flink CDC 
version are you using?  Is Flink CDC used in SQL or DataStream ? 
That’s helpful if you can post you Flink CDC connector parameters.

Best,
Leonard

> 在 2021年7月29日，18:57，Ayush Chauhan <ayush.chau...@zomato.com> 写道：
> 
> Hi all,
> 
> We are using Flink + iceberg to consume CDC data. We have combined all the 
> tables of a single DB in one job. Our job is frequently running into GC 
> issues. Earlier it was running default on parallel GC and I have changed it 
> to G1GC. G1GC did bring some improvements but still, I am facing the same 
> problem. 
> 
> Following are the params on my job - -ytm 5120m -yjm 1024m -yD 
> env.java.opts="-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35"
> 
> This job is running CDC ingestion for 17 tables with a parallelism of 1 and 
> throughput is around ~10k messages for the 10minutes checkpointing interval
> 
> I am attaching a part of the thread dump in this email.
> 
> During old GC, the job gets stuck and its checkpointing which is normally 
> under 1 sec gets increased exponentially to the timeout threshold. Job either 
> get failed due to checkpointing timeout or it failed to get the heartbeat of 
> the task manager
> 
> <Screenshot 2021-07-29 at 16.09.19.png>
> <Screenshot 2021-07-29 at 16.08.58.png>
> 
> 
> -- 
>  Ayush Chauhan
> 
> 
> 
> This email is intended only for the person or the entity to whom it is 
> addressed. If you are not the intended recipient, please delete this email 
> and contact the sender.
> <thread_dump.txt>

Re: Flink CDC job getting failed due to G1 old gc

Reply via email to