The related job manager log is https://gist.github.com/Ethanlm/86a10e786ad9025ddaa27c113c536da8
> On Feb 14, 2019, at 9:40 AM, Ethan Li <ethanopensou...@gmail.com> wrote: > > Hello, > > I have a standalone flink-1.4.2 cluster with one JobManager, one TaskManager, > and zookeeper. I first started JM and TM and waited for them to be stable. > Then I restarted JM. It’s when the TM got confused. > > TM got notified that Leader node has changed and it tried to register to the > new Leader (the new rpc port is 34561). Then it got the acknowledge says it’s > already registered. And it then kept trying to associate with the old JM roc > port (35213) and fail. > > 2019-02-14 14:56:54,059 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager > akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager > > <akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager> > (attempt 1, timeout: 500 milliseconds) > 2019-02-14 14:56:54,157 DEBUG > org.apache.flink.shaded.akka.org.jboss.netty.handler.ssl.SslHandler - [id: > 0x77ac93ae, /10.215.68.243:46796 => > openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561 > <http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561>] > HANDSHAKEN: TLS_RSA_WITH_AES_128_CBC_SHA > 2019-02-14 14:56:54,276 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Successful > registration at JobManager > (akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager > > <akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager>), > starting network stack and library cache. > 2019-02-14 14:56:54,276 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Determined > BLOB server address to be > openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100 > <http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100>. Starting > BLOB cache. > 2019-02-14 14:56:54,278 INFO > org.apache.flink.runtime.blob.PermanentBlobCache - Created BLOB > cache storage directory > /home/y/var/flink/blobstorage/blobStore-927b523f-f3ff-4ccc-83a0-362e09a3b858 > 2019-02-14 14:56:54,279 INFO > org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB > cache storage directory > /home/y/var/flink/blobstorage/blobStore-8492465e-0e94-4792-a346-66e6da299f7a > 2019-02-14 14:56:54,572 DEBUG > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > was triggered to register at JobManager, but is already registered > 2019-02-14 14:56:56,359 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: > openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213 > <http://openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213> > 2019-02-14 14:56:56,360 DEBUG > org.apache.flink.runtime.taskmanager.TaskManager - The > association error event's root cause is not of type > InvalidAssociationException. > > > > Full Task manage log: > https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643 > <https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643> > > > Is this expected or is this a bug? > > Thank you! > > Ethan