Re: Flink application kept restarting

2021-03-04 Thread Rainie Li
l only help if the cause of the time is very transient. > > > > Julian > > > > *From: *Rainie Li > *Date: *Thursday, March 4, 2021 at 1:49 PM > *To: *"matth...@ververica.com" > *Cc: *user , Chesnay Schepler > *Subject: *Re: Flink application kept res

Re: Flink application kept restarting

2021-03-04 Thread Jaffe, Julian
time is very transient. Julian From: Rainie Li Date: Thursday, March 4, 2021 at 1:49 PM To: "matth...@ververica.com" Cc: user , Chesnay Schepler Subject: Re: Flink application kept restarting Hi Matthias, Do you have any suggestions to handle timeout issues when fetching data fro

Re: Flink application kept restarting

2021-03-04 Thread Rainie Li
Hi Matthias, Do you have any suggestions to handle timeout issues when fetching data from a Kafka topic? I am thinking of adding a retry logic into flink job, not sure if this is the right direction. Thanks again Best regards Rainie On Wed, Mar 3, 2021 at 12:24 AM Matthias Pohl wrote: > Hi Rai

Re: Flink application kept restarting

2021-03-03 Thread Rainie Li
I see. Thank you for the explanation. Best regards Rainie On Wed, Mar 3, 2021 at 12:24 AM Matthias Pohl wrote: > Hi Rainie, > in general buffer pools being destroyed usually mean that some other > exception occurred that caused the task to fail and in the process of > failure handling the opera

Re: Flink application kept restarting

2021-03-03 Thread Matthias Pohl
Hi Rainie, in general buffer pools being destroyed usually mean that some other exception occurred that caused the task to fail and in the process of failure handling the operator-related network buffer is destroyed. That causes the "java.lang.RuntimeException: Buffer pool is destroyed." in your ca

Re: Flink application kept restarting

2021-03-02 Thread Rainie Li
Thanks for checking, Matthias. I have another flink job which failed last weekend with the same buffer pool destroyed error. This job is also running version 1.9. Here is the error I found from the task manager log. Any suggestion what is the root cause and how to fix it? 2021-02-28 00:54:45,943

Re: Flink application kept restarting

2021-03-01 Thread Matthias Pohl
Another question is: The timeout of 48 hours sounds strange. There should have been some other system noticing the connection problem earlier assuming that you have a reasonably low heartbeat interval configured. Matthias On Mon, Mar 1, 2021 at 1:22 PM Matthias Pohl wrote: > Thanks for providin

Re: Flink application kept restarting

2021-03-01 Thread Matthias Pohl
Thanks for providing this information, Rainie. Are other issues documented in the logs besides the TimeoutException in the JM logs which you already shared? For now, it looks like that there was a connection problem between the TaskManager and the JobManager that caused the shutdown of the operator

Re: Flink application kept restarting

2021-02-26 Thread Rainie Li
Thank you Mattias. It’s version1.9. Best regards Rainie On Fri, Feb 26, 2021 at 6:33 AM Matthias Pohl wrote: > Hi Rainie, > the network buffer pool was destroyed for some reason. This happens when > the NettyShuffleEnvironment gets closed which is triggered when an operator > is cleaned up, for

Re: Flink application kept restarting

2021-02-26 Thread Matthias Pohl
Hi Rainie, the network buffer pool was destroyed for some reason. This happens when the NettyShuffleEnvironment gets closed which is triggered when an operator is cleaned up, for instance. Maybe, the timeout in the metric system caused this. But I'm not sure how this is connected. I'm gonna add Che