Re: Recovery problem 1 of 2 in Flink 1.6.3

2019-01-15 Thread Till Rohrmann
Hi John, this is definitely not how Flink should behave in this situation and could indicate a bug. From the logs I couldn't figure out the problem. Would it be possible to obtain for the TMs and JM the full logs with DEBUG log level? This would help me to further debug the problem. Cheers, Till

Re: Recovery problem 1 of 2 in Flink 1.6.3

2019-01-14 Thread John Stone
Is this a known issue? Should I create a Jira ticket? Does anyone have anything they would like me to try? I’m very lost at this point. I’ve now seen this issue happen without destroying pods, i.e. the job running crashes after several hours and fails to recover once all task slots are consu