I am attempting to migrate from 1.7.1 to 1.9.1 and I have hit a problem where
previously working jobs can no longer launch after being submitted. In the UI,
the submitted jobs show up as deploying for a period, then go into a run state
before returning to the deploy state and this repeats regularly with the job
bouncing between states. No exceptions or errors are visible in the logs. There
is no data coming in for the job to process and the kafka queues are empty.
If I look at the thread activity of the task manager running the job in top, I
see that the busiest threads are flink-akka threads, sometimes jumping to very
high CPU numbers. That is all I have for info.
Any suggestions on how to debug this? I can set break points and connect if
that helps, just not sure at this point where to start.
Thanks,
Jason