Hey Stefano, Flink's resource management has been refactored for 1.1 recently. This could be a regression introduced by this. Max can probably help you with more details. Is this currently a blocker for you?
– Ufuk On Tue, Apr 19, 2016 at 6:31 PM, Stefano Baghino <stefano.bagh...@radicalbit.io> wrote: > Hi everyone, > > I'm currently experiencing a weird situation, I hope you can help me out > with this. > > I've cloned and built from the master, then I've edited the default config > fil by adding my Hadoop config path, exported the HADOOP_CONF_DIR env var > and ran bin/yarn-session.sh -n 1 -s 2 -jm 2048 -tm 2048 > > The first thing I noticed is that I had to put "-s 2" or the task managers > gets created with -1 slots (!) by default. > > After putting "-s 2" the YARN session startup hangs when trying to register > the task managers. I've stopped the session and aggregated the logs and read > a lot (several thousands) of the messages I attach at the bottom; any idea > of what this may be? > > Thank you a lot in advance! > > 2016-04-19 12:15:59,507 INFO org.apache.flink.yarn.YarnTaskManager > - Trying to register at JobManager > akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 1, timeout: > 500 milliseconds) > > 2016-04-19 12:15:59,649 ERROR org.apache.flink.yarn.YarnTaskManager > - The registration at JobManager > Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused, > because: java.lang.IllegalStateException: Resource > ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not > registered with resource manager.. Retrying later... > > 2016-04-19 12:16:00,025 INFO org.apache.flink.yarn.YarnTaskManager > - Trying to register at JobManager > akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 2, timeout: > 1000 milliseconds) > > 2016-04-19 12:16:00,033 ERROR org.apache.flink.yarn.YarnTaskManager > - The registration at JobManager > Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused, > because: java.lang.IllegalStateException: Resource > ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not > registered with resource manager.. Retrying later... > > 2016-04-19 12:16:01,045 INFO org.apache.flink.yarn.YarnTaskManager > - Trying to register at JobManager > akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 3, timeout: > 2000 milliseconds) > > 2016-04-19 12:16:01,053 ERROR org.apache.flink.yarn.YarnTaskManager > - The registration at JobManager > Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused, > because: java.lang.IllegalStateException: Resource > ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not > registered with resource manager.. Retrying later... > > 2016-04-19 12:16:03,064 INFO org.apache.flink.yarn.YarnTaskManager > - Trying to register at JobManager > akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 4, timeout: > 4000 milliseconds) > > 2016-04-19 12:16:03,072 ERROR org.apache.flink.yarn.YarnTaskManager > - The registration at JobManager > Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused, > because: java.lang.IllegalStateException: Resource > ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not > registered with resource manager.. Retrying later... > > 2016-04-19 12:16:07,085 INFO org.apache.flink.yarn.YarnTaskManager > - Trying to register at JobManager > akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 5, timeout: > 8000 milliseconds) > > 2016-04-19 12:16:07,092 ERROR org.apache.flink.yarn.YarnTaskManager > - The registration at JobManager > Some(akka.tcp://flink@172.31.20.101:57379/user/jobmanager) was refused, > because: java.lang.IllegalStateException: Resource > ResourceID{resourceId='container_e02_1461077293721_0016_01_000002'} not > registered with resource manager.. Retrying later... > > 2016-04-19 12:16:09,664 INFO org.apache.flink.yarn.YarnTaskManager > - Trying to register at JobManager > akka.tcp://flink@172.31.20.101:57379/user/jobmanager (attempt 1, timeout: > 500 milliseconds) > > > -- > BR, > Stefano Baghino > > Software Engineer @ Radicalbit