Re: Kafka Consumer Retries Failing

2021-07-19 Thread Piotr Nowojski
Ok, thanks for the update. Great that you managed to resolve this issue :) Best, Piotrek pon., 19 lip 2021 o 17:13 Rahul Patwari napisał(a): > Hi Piotrek, > > I was just about to update. > You are right. The issue is because of a stalled task manager due to High > Heap Usage. And the High Heap

Re: Kafka Consumer Retries Failing

2021-07-19 Thread Rahul Patwari
Hi Piotrek, I was just about to update. You are right. The issue is because of a stalled task manager due to High Heap Usage. And the High Heap Usage is because of a Memory Leak in a library we are using. Thanks for your help. On Mon, Jul 19, 2021 at 8:31 PM Piotr Nowojski wrote: > Thanks for

Re: Kafka Consumer Retries Failing

2021-07-19 Thread Piotr Nowojski
Thanks for the update. > Could the backpressure timeout and heartbeat timeout be because of Heap Usage close to Max configured? Could be. This is one of the things I had in mind under overloaded in: > might be related to one another via some different deeper problem (broken network environment,

Re: Kafka Consumer Retries Failing

2021-07-15 Thread Rahul Patwari
Thanks for the feedback Piotrek. We have observed the issue again today. As we are using Flink 1.11.1, I tried to check the backpressure of Kafka source tasks from the Jobmanager UI. The backpressure request was canceled due to Timeout and "No Data" was displayed in UI. Here are the respective log

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski
Hi Rahul, I would highly doubt that you are hitting the network bottleneck case. It would require either a broken environment/network or throughputs in orders of GB/second. More likely you are seeing empty input pool and you haven't checked the documentation [1]: > inPoolUsage - An estimate of th

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Rahul Patwari
Thanks, Piotrek. We have two Kafka sources. We are facing this issue for both of them. The downstream tasks with the sources form two independent directed acyclic graphs, running within the same Streaming Job. For Example: source1 -> task1 -> sink1 source2 -> task2 -> sink2 There is backpressure

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Piotr Nowojski
Hi, Waiting for memory from LocalBufferPool is a perfectly normal symptom of a backpressure [1][2]. Best, Piotrek [1] https://flink.apache.org/2021/07/07/backpressure.html [2] https://www.ververica.com/blog/how-flink-handles-backpressure śr., 14 lip 2021 o 06:05 Rahul Patwari napisał(a): > Th

Re: Kafka Consumer Retries Failing

2021-07-13 Thread Rahul Patwari
Thanks, David, Piotr for your reply. I managed to capture the Thread dump from Jobmanaager UI for few task managers. Here is the thread dump for Kafka Source tasks in one task manager. I could see the same stack trace in other task managers as well. It seems like Kafka Source tasks are waiting on

Re: Kafka Consumer Retries Failing

2021-07-13 Thread Piotr Nowojski
Hi, I'm not sure, maybe someone will be able to help you, but it sounds like it would be better for you to: - google search something like "Kafka Error sending fetch request TimeoutException" (I see there are quite a lot of results, some of them might be related) - ask this question on the Kafka m

Kafka Consumer Retries Failing

2021-07-09 Thread Rahul Patwari
Hi, We have a Flink 1.11.1 Version streaming pipeline in production which reads from Kafka. Kafka Server version is 2.5.0 - confluent 5.5.0 Kafka Client Version is 2.4.1 - {"component":"org.apache.kafka.common.utils.AppInfoParser$AppInfo","message":"Kafka version: 2.4.1","method":""} Occasionally