Hi Andrey, Thanks a lot for your reply! > What was the full job life cycle?
1. The job is deployed as a YARN cluster with the following properties set high-availability: zookeeper high-availability.zookeeper.quorum: <a list of zookeeper hosts> high-availability.zookeeper.storageDir: hdfs:///<recovery-folder-path> high-availability.zookeeper.path.root: <flink-root-path> high-availability.zookeeper.path.namespace: <flink-job-name> 2. The job is cancelled via flink cancel <job-id> command. What I've noticed: when the job is running the following directory structure is created in zookeeper /<flink-root-path>/<flink-job-name>/leader/resource_manager_lock /<flink-root-path>/<flink-job-name>/leader/rest_server_lock /<flink-root-path>/<flink-job-name>/leader/dispatcher_lock /<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock /<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock /<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock /<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock /<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock /<flink-root-path>/<flink-job-name>/checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/0000000000000000041 /<flink-root-path>/<flink-job-name>/checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde /<flink-root-path>/<flink-job-name>/running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde when the job is cancelled the some ephemeral nodes disappear, but most of them are still there: /<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde /<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock /<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock /<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock /<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock /<flink-root-path>/<flink-job-name>/checkpoints/ /<flink-root-path>/<flink-job-name>/checkpoint-counter/ /<flink-root-path>/<flink-job-name>/running_job_registry/ > Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? I start the job with Flink-1.6.1 > Was there a failover of Job Master while running before the cancelation? no there was no failover, as the job is deployed as a YARN cluster, (YARN Cluster High Availability guide states that no failover is required) > What version of Zookeeper do you use? Zookeer-3.4.10 > In general, it should not be the case and all job related data should be > cleaned from Zookeeper upon cancellation. as far as I understood the issue concerns a JobManager failover process and my question is about a manual intended cancellation of a job. Here is the method [1] responsible for cleaning zookeeper folders up [1] which is called when the job manager has stopped [2]. And it seems it only cleans up the folder running_job_registry, other folders stay untouched. I supposed that everything under the /<flink-root-path>/<flink-job-name>/ folder is cleaned up when the job is cancelled. [1] https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107 [2] https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332 Kind Regards, Mike Pryakhin > On 26 Oct 2018, at 12:39, Andrey Zagrebin <and...@data-artisans.com> wrote: > > Hi Mike, > > What was the full job life cycle? > Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? > Was there a failover of Job Master while running before the cancelation? > What version of Zookeeper do you use? > > Flink creates child nodes to create a lock for the job in Zookeeper. > Lock is removed by removing child node (ephemeral). > Persistent node can be a problem because if job dies and does not remove it, > persistent node will not timeout and disappear as ephemeral one > and the next job instance will not delete it because it is supposed to be > locked by the previous. > > There was a recent fix in 1.6.1 where the job data was not properly deleted > from Zookeeper [1]. > In general, it should not be the case and all job related data should be > cleaned from Zookeeper upon cancellation. > > Best, > Andrey > > [1] https://issues.apache.org/jira/browse/FLINK-10011 > <https://issues.apache.org/jira/browse/FLINK-10011> > >> On 25 Oct 2018, at 15:30, Mikhail Pryakhin <m.prya...@gmail.com >> <mailto:m.prya...@gmail.com>> wrote: >> >> Hi Flink experts! >> >> When a streaming job with Zookeeper-HA enabled gets cancelled all the >> job-related Zookeeper nodes are not removed. Is there a reason behind that? >> I noticed that Zookeeper paths are created of type "Container Node" (an >> Ephemeral node that can have nested nodes) and fall back to Persistent node >> type in case Zookeeper doesn't support this sort of nodes. >> But anyway, it is worth removing the job Zookeeper node when a job is >> cancelled, isn't it? >> >> Thank you in advance! >> >> Kind Regards, >> Mike Pryakhin >> >
smime.p7s
Description: S/MIME cryptographic signature