Hi Abhishek,
Does your job use checkpointing? It seems like it's the first time the
respective checkpoint/savepoint thread pool is touched and at this point,
there are not enough handles.
Do you have a way to inspect the ulimits on the task managers?
If you don't have a way to change the limits,
Hello,
I am observing a failure whenever I trigger a savepoint on my Flink
Application which otherwise runs without issues
The app is deployed via AWS KDA(Kubernetes) with 256 KPU(6 Task managers
with 43 slots each. 1 KPU = 1 vCPU, 4GB Memory, and 50GB Diskspace. It uses
RocksDB backend)
The sav