Hi community! Righ after I've upgraded flink up to flink-1.6.1 I get an exception during job deployment as a YARN cluster. The job is submitted with zookeper HA enabled, in detached mode.
The flink yaml contains the following properties: high-availability: zookeeper high-availability.zookeeper.quorum: <a list of zookeeper hosts> high-availability.zookeeper.storageDir: hdfs:///<recovery-folder-path> high-availability.zookeeper.path.root: <flink-root-path> high-availability.zookeeper.path.namespace: <flink-job-name> the job is deployed via flink CLI command like the following: "${FLINK_HOME}/bin/flink" run \ -m yarn-cluster \ -ynm "${JOB_NAME}-${JOB_VERSION}" \ -yn "${tm_containers}" \ -ys "${tm_slots}" \ -ytm "${tm_memory}" \ -yjm "${jm_memory}" \ -p "${parallelism}" \ -yqu "${queue}" \ -yt "${YARN_APP_PATH}" \ -c "${MAIN_CLASS}" \ -yst \ -yd \ ${class_path} \ "${YARN_APP_PATH}"/"${APP_JAR}" After the job has been successfully deplyed, I've got an exception: 2018-10-26 18:29:17,781 | ERROR | Curator-Framework-0 | org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl | Background exception was not retry-able or retry gave up java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1406) at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1097) at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1130) at org.apache.flink.shaded.curator.org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CreateBuilderImpl$7.performBackgroundOperation(CreateBuilderImpl.java:561) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:72) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:831) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) If the job is deployed in "attached mode" everything goes fine. Kind Regards, Mike Pryakhin
smime.p7s
Description: S/MIME cryptographic signature