Re: Flink taskmanager in crash loop

2021-08-17 Thread Yangze Guo
> 2021-08-16 15:58:13.986 [Cancellation Watchdog for Source: MASKED] ERROR > org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal > error > occurred while executing the TaskManager. Shutting it down... > org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully > with

Re: Flink taskmanager in crash loop

2021-08-17 Thread Abhishek Rai
Before these message, there is the following message in the log: 2021-08-12 23:02:58.015 [Canceler/Interrupts for Source: MASKED]) (1/1)#29103' did not react to cancelling signal for 30 seconds, but is stuck in method: java.base@11.0.11/jdk.internal.misc.Unsafe.park(Native Method) java.base@11.0.

Re: Flink taskmanager in crash loop

2021-08-17 Thread Abhishek Rai
Thanks Yangze, indeed, I see the following in the log about 10s before the final crash (masked some sensitive data using `MASKED`): 2021-08-16 15:58:13.985 [Canceler/Interrupts for Source: MAKSED] WARN org.apache.flink.runtime.taskmanager.Task - Task 'MASKED' did not react to cancelling signal fo

Re: Flink taskmanager in crash loop

2021-08-16 Thread Yangze Guo
Hi, Abhishek, Do you see something like "Fatal error occurred while executing the TaskManager" in your log or would you like to provide the whole task manager log? Best, Yangze Guo On Tue, Aug 17, 2021 at 5:17 AM Abhishek Rai wrote: > > Hello, > > In our production environment, running Flink 1.

Flink taskmanager in crash loop

2021-08-16 Thread Abhishek Rai
Hello, In our production environment, running Flink 1.13 (Scala 2.11), where Flink has been working without issues with a dozen or so jobs running for a while, Flink taskmanager started crash looping with a period of ~4 minutes per crash. The stack trace is not very informative, therefore reachin