Hello guys, We have been seeing an issue with our Flink applications. Our applications run fine for several hours, and then we see an error/exception like so:
java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not acquire the minimum required resources. For some applications, this error/exception appears once, which stays in history for a while and but the job recovers. However, for some applications, we see this error thrown repeatedly, and the application gets into a crash loop. Since our application had been running fine for several hours before we see such a message, our suspicion is that when the crash happens, the job manager aggressively tries to start back the job, and is not able to acquire enough resources because the previous job has not cleaned up as yet. Has anyone else been seeing this issue? If so, what did you guys try to fix it? Thanks, HKB