Hi Jason, getting access to the log files would help most to figure out what's going wrong.
Cheers, Till On Tue, Jan 28, 2020 at 9:08 AM Arvid Heise <ar...@ververica.com> wrote: > Hi Jason, > > could you describe your topology? Are you writing to Kafka? Are you using > exactly once? Are you seeing any warning? > If so, one thing that immediately comes to my mind is > transaction.max.timeout.ms. If the value in flink (by default 1h) is > higher than what the Kafka brokers support, it may run into indefinite > restart loops in rare cases. > > "Kafka brokers by default have transaction.max.timeout.ms set to 15 > minutes. This property will not allow to set transaction timeouts for the > producers larger than it’s value. FlinkKafkaProducer011 by default sets > the transaction.timeout.ms property in producer config to 1 hour, thus > transaction.max.timeout.ms should be increased before using the > Semantic.EXACTLY_ONCE mode." > > Best, > > Arvid > > On Fri, Jan 24, 2020 at 2:47 AM Jason Kania <jason.ka...@ymail.com> wrote: > >> I am attempting to migrate from 1.7.1 to 1.9.1 and I have hit a problem >> where previously working jobs can no longer launch after being submitted. >> In the UI, the submitted jobs show up as deploying for a period, then go >> into a run state before returning to the deploy state and this repeats >> regularly with the job bouncing between states. No exceptions or errors are >> visible in the logs. There is no data coming in for the job to process and >> the kafka queues are empty. >> >> If I look at the thread activity of the task manager running the job in >> top, I see that the busiest threads are flink-akka threads, sometimes >> jumping to very high CPU numbers. That is all I have for info. >> >> Any suggestions on how to debug this? I can set break points and connect >> if that helps, just not sure at this point where to start. >> >> Thanks, >> >> Jason >> >