Hi Ethan, can you observe a similar behaviour with Flink 1.7.1? Flink 1.4.2 is no longer supported by the community.
Cheers, Till On Thu, Feb 14, 2019 at 5:06 PM Ethan Li <ethanopensou...@gmail.com> wrote: > The related job manager log is > https://gist.github.com/Ethanlm/86a10e786ad9025ddaa27c113c536da8 > > On Feb 14, 2019, at 9:40 AM, Ethan Li <ethanopensou...@gmail.com> wrote: > > Hello, > > I have a standalone flink-1.4.2 cluster with one JobManager, one > TaskManager, and zookeeper. I first started JM and TM and waited for them > to be stable. Then I restarted JM. It’s when the TM got confused. > > TM got notified that Leader node has changed and it tried to register to > the new Leader (the new rpc port is 34561). Then it got the acknowledge > says it’s already registered. And it then kept trying to associate with the > old JM roc port (35213) and fail. > > 2019-02-14 14:56:54,059 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager > akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager > (attempt 1, timeout: 500 milliseconds) > 2019-02-14 14:56:54,157 DEBUG > org.apache.flink.shaded.akka.org.jboss.netty.handler.ssl.SslHandler - [id: > 0x77ac93ae, /10.215.68.243:46796 => > openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:34561] HANDSHAKEN: > TLS_RSA_WITH_AES_128_CBC_SHA > 2019-02-14 14:56:54,276 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Successful > registration at JobManager ( > akka.ssl.tcp://fl...@openstorm10blue-n1.blue.ygrid.yahoo.com:34561/user/jobmanager), > starting network stack and library cache. > 2019-02-14 14:56:54,276 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Determined > BLOB server address to be > openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:50100. Starting BLOB > cache. > 2019-02-14 14:56:54,278 INFO > org.apache.flink.runtime.blob.PermanentBlobCache - Created > BLOB cache storage directory > /home/y/var/flink/blobstorage/blobStore-927b523f-f3ff-4ccc-83a0-362e09a3b858 > 2019-02-14 14:56:54,279 INFO > org.apache.flink.runtime.blob.TransientBlobCache - Created > BLOB cache storage directory > /home/y/var/flink/blobstorage/blobStore-8492465e-0e94-4792-a346-66e6da299f7a > 2019-02-14 14:56:54,572 DEBUG > org.apache.flink.runtime.taskmanager.TaskManager - TaskManager > was triggered to register at JobManager, but is already registered > 2019-02-14 14:56:56,359 WARN akka.remote.transport.netty.NettyTransport > - Remote connection to [null] failed with > java.net.ConnectException: Connection refused: > openstorm10blue-n1.blue.ygrid.yahoo.com/10.215.68.98:35213 > 2019-02-14 14:56:56,360 DEBUG > org.apache.flink.runtime.taskmanager.TaskManager - The > association error event's root cause is not of type > InvalidAssociationException. > > > > Full Task manage log: > https://gist.github.com/Ethanlm/e6f1b29d27d26813f5f8f40cd2c12643 > > > Is this expected or is this a bug? > > Thank you! > > Ethan > > >