Hi Mike, could you also send me the YarnJobClusterEntrypoint logs. Thanks!
Cheers, Till On Wed, Nov 7, 2018 at 9:27 PM Mikhail Pryakhin <m.prya...@gmail.com> wrote: > Hi Till, > Thank you for your reply. > Yes, I’ve upgraded to the latest Flink-1.6.2 and the problem is still > there, please find the log file attached. > > > Kind Regards, > Mike Pryakhin > > On 7 Nov 2018, at 18:46, Till Rohrmann <trohrm...@apache.org> wrote: > > Hi Mike, > > have you tried whether the problem also occurs with Flink 1.6.2? If yes, > then please share with us the Flink logs with DEBUG log level to further > debug the problem. > > Cheers, > Till > > On Fri, Oct 26, 2018 at 5:46 PM Mikhail Pryakhin <m.prya...@gmail.com> > wrote: > >> Hi community! >> >> Righ after I've upgraded flink up to flink-1.6.1 I get an exception >> during job deployment as a YARN cluster. >> The job is submitted with zookeper HA enabled, in detached mode. >> >> The flink yaml contains the following properties: >> >> high-availability: zookeeper >> high-availability.zookeeper.quorum: <a list of zookeeper hosts> >> high-availability.zookeeper.storageDir: hdfs:///<recovery-folder-path> >> high-availability.zookeeper.path.root: <flink-root-path> >> high-availability.zookeeper.path.namespace: <flink-job-name> >> >> the job is deployed via flink CLI command like the following: >> >> "${FLINK_HOME}/bin/flink" run \ >> -m yarn-cluster \ >> -ynm "${JOB_NAME}-${JOB_VERSION}" \ >> -yn "${tm_containers}" \ >> -ys "${tm_slots}" \ >> -ytm "${tm_memory}" \ >> -yjm "${jm_memory}" \ >> -p "${parallelism}" \ >> -yqu "${queue}" \ >> -yt "${YARN_APP_PATH}" \ >> -c "${MAIN_CLASS}" \ >> -yst \ >> -yd \ >> ${class_path} \ >> "${YARN_APP_PATH}"/"${APP_JAR}" >> >> >> After the job has been successfully deplyed, I've got an exception: >> >> 2018-10-26 18:29:17,781 | ERROR | Curator-Framework-0 | >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl >> | Background exception was not retry-able or retry gave up >> java.lang.InterruptedException >> at java.lang.Object.wait(Native Method) >> at java.lang.Object.wait(Object.java:502) >> at >> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1406) >> at >> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1097) >> at >> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1130) >> at >> org.apache.flink.shaded.curator.org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274) >> at >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CreateBuilderImpl$7.performBackgroundOperation(CreateBuilderImpl.java:561) >> at >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:72) >> at >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:831) >> at >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) >> at >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) >> at >> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> If the job is deployed in "attached mode" everything goes fine. >> >> >> >> >> >> Kind Regards, >> Mike Pryakhin >> >> >