Hi Cliff, the TaskManger fail to start with exit code 31 which indicates an initialization error on startup. If you check the TaskManager logs via `yarn logs -applicationId <APP_ID>` you should see the problem why the TMs don't start up.
Cheers, Till On Fri, Nov 9, 2018 at 8:32 PM Cliff Resnick <cre...@gmail.com> wrote: > Hi Till, > > Here are Job Manager logs, same job in both 1.6.0 and 1.6.2 at DEBUG > level. I saw several errors in 1.6.2, hope it's informative! > > Cliff > > On Fri, Nov 9, 2018 at 8:34 AM Till Rohrmann <trohrm...@apache.org> wrote: > >> Hi Cliff, >> >> this sounds not right. Could you share the logs of the Yarn cluster >> entrypoint with the community for further debugging? Ideally on DEBUG >> level. The Yarn logs would also be helpful to fully understand the problem. >> Thanks a lot! >> >> Cheers, >> Till >> >> On Thu, Nov 8, 2018 at 9:59 PM Cliff Resnick <cre...@gmail.com> wrote: >> >>> I'm running a YARN cluster of 8 * 4 core instances = 32 cores, with a >>> configuration of 3 slots per TM. The cluster is dedicated to a single job >>> that runs at full capacity in "FLIP6" mode. So in this cluster, the >>> parallelism is 21 (7 TMs * 3, one container dedicated for Job Manager). >>> >>> When I run the job in 1.6.0, seven Task Managers are spun up as >>> expected. But if I run with 1.6.2 only four Task Managers spin up and the >>> job hangs waiting for more resources. >>> >>> Our Flink distribution is set up by script after building from source. >>> So aside from flink jars, both 1.6.0 and 1.6.2 directories are identical. >>> The job is the same, restarting from savepoint. The problem is repeatable. >>> >>> Has something changed in 1.6.2, and if so can it be remedied with a >>> config change? >>> >>> >>> >>> >>> >>>