Hi guys,
    I was running a Flink job (12 parallelism) on an EMR cluster with 48
YARN slots. When the job starts, I can see from Flink UI that the job took
12 slots, and 36 slots were left available.

    I would expect that when the job fails, it would restart from
checkpointing by taking another 12 slots and freeing the original 12
slots. *Well,
I observed that the job took new slots but never free original slots. The
Flink job ended up killed by YARN because there's no available slots
anymore.*

     Here's the command I ran Flink job:

     ```
     flink run -m yarn-cluster -yn 6 -ys 8 -ytm 40000  xxx.jar
     ```

     Does anyone know what's going wrong?

Thanks,
Bowen

Reply via email to