[ https://issues.apache.org/jira/browse/FLINK-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166478#comment-17166478 ]
Till Rohrmann edited comment on FLINK-18733 at 7/28/20, 2:56 PM: ----------------------------------------------------------------- I tested the scenario with running a ZooKeeper cluster on the same machine where I also started Flink (but not the same process). I did not configure SASL and it worked when using a resolvable name. When configuring an unresolvable name, I saw the exact same stack traces as you did (including the SASL part). Hence, I would assume that ZooKeeper/Curator also used this code path when running the test with a resolvable name. Have you checked whether there this is a ZooKeeper issue for a similar problem? was (Author: till.rohrmann): I tested the scenario with running a ZooKeeper cluster on the same machine where I also started Flink (but not the same process). I did not configure SASL and it worked when using a resolvable name. When configuring an unresolvable name, I saw the exact same stack traces as you did (including the SASL part). Hence, I would assume that ZooKeeper/Curator also used this code path when running the test with a resolvable name. > Jobmanager cannot start in HA mode with Zookeeper > ------------------------------------------------- > > Key: FLINK-18733 > URL: https://issues.apache.org/jira/browse/FLINK-18733 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.11.1 > Reporter: Leonid Ilyevsky > Priority: Major > Attachments: flink-conf.yaml, > flink-liquidnt-standalonesession-0-nj1dvloglab01.liquidnet.biz.log, > flink-liquidnt-taskexecutor-0-nj1dvloglab01.liquidnet.biz.log > > > When configured in HA mode, the Jobmanager cannot start at all. First, it > issues warnings like this: > {quote}{{2020-07-27 08:58:23,197 WARN > org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - > Session 0x0 for server *nj1dvloglab01.liquidnet.biz/<unresolved>:2181*, > unexpected error, closing socket connection and attempting reconnect}} > {{java.lang.IllegalArgumentException: *Unable to canonicalize address* > nj1dvloglab01.liquidnet.biz/<unresolved>:2181 because it's not resolvable}} > {{ at > org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) > ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}} > {{ at > org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) > ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}} > {{ at > org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) > ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}} > {{ at > org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) > [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}} > {quote} > After few attempts connecting to Zookeeper, it finally fails: > {quote}2020-07-27 08:59:35,055 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. > org.apache.flink.util.FlinkException: Unhandled error in > ZooKeeperLeaderElectionService: Ensure path threw exception > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService.unhandledError(ZooKeeperLeaderElectionService.java:430) > ~[flink-dist_2.12-1.11.1.jar:1.11.1] > {quote} > > The same HA configuration works fine for me in Flink 1.10.0. > -- This message was sent by Atlassian Jira (v8.3.4#803005)