Hi guys, I was running a Flink job (12 parallelism) on an EMR cluster with 48 YARN slots. When the job starts, I can see from Flink UI that the job took 12 slots, and 36 slots were left available.
I would expect that when the job fails, it would restart from checkpointing by taking another 12 slots and freeing the original 12 slots. *Well, I observed that the job took new slots but never free original slots. The Flink job ended up killed by YARN because there's no available slots anymore.* Here's the command I ran Flink job: ``` flink run -m yarn-cluster -yn 6 -ys 8 -ytm 40000 xxx.jar ``` Does anyone know what's going wrong? Thanks, Bowen