flink-1.6.1 :: job deployment :: detached mode

Mikhail Pryakhin Fri, 26 Oct 2018 08:46:57 -0700

Hi community!

Righ after I've upgraded flink up to flink-1.6.1 I get an exception during job 
deployment as a YARN cluster. 
The job is submitted with zookeper HA enabled, in detached mode.


The flink yaml contains the following properties:

high-availability: zookeeper
high-availability.zookeeper.quorum: <a list of zookeeper hosts>
high-availability.zookeeper.storageDir: hdfs:///<recovery-folder-path>
high-availability.zookeeper.path.root: <flink-root-path>
high-availability.zookeeper.path.namespace: <flink-job-name>

the job is deployed via flink CLI command like the following:

"${FLINK_HOME}/bin/flink" run \
        -m yarn-cluster \
    -ynm "${JOB_NAME}-${JOB_VERSION}" \
    -yn "${tm_containers}" \
    -ys "${tm_slots}" \
    -ytm "${tm_memory}" \
    -yjm "${jm_memory}" \
    -p "${parallelism}" \
    -yqu "${queue}" \
    -yt "${YARN_APP_PATH}" \
    -c "${MAIN_CLASS}" \
    -yst \
    -yd \
    ${class_path} \
    "${YARN_APP_PATH}"/"${APP_JAR}"


After the job has been successfully deplyed, I've got an exception:

2018-10-26 18:29:17,781 | ERROR | Curator-Framework-0 | 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
 | Background exception was not retry-able or retry gave up
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1406)
        at 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1097)
        at 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1130)
        at 
org.apache.flink.shaded.curator.org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274)
        at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CreateBuilderImpl$7.performBackgroundOperation(CreateBuilderImpl.java:561)
        at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:72)
        at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:831)
        at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
        at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
        at 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

If the job is deployed in "attached mode" everything goes fine.





Kind Regards,
Mike Pryakhin

smime.p7s
Description: S/MIME cryptographic signature

flink-1.6.1 :: job deployment :: detached mode

Reply via email to