I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a configuration of 3 slots per TM. The cluster is dedicated to a single job that runs at full capacity in "FLIP6" mode. So in this cluster, the parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager).
When I run the job in 1.6.0, seven Task Managers are spun up as expected. But if I run with 1.6.2 only four Task Managers spin up and the job hangs waiting for more resources. Our Flink distribution is set up by script after building from source. So aside from flink jars, both 1.6.0 and 1.6.2 directories are identical. The job is the same, restarting from savepoint. The problem is repeatable. Has something changed in 1.6.2, and if so can it be remedied with a config change?