Re: Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

Mikhail Pryakhin Fri, 26 Oct 2018 07:32:05 -0700

Hi Andrey, Thanks a lot for your reply!

> What was the full job life cycle?


1. The job is deployed as a YARN cluster with the following properties set

        high-availability: zookeeper
        high-availability.zookeeper.quorum: <a list of zookeeper hosts>
        high-availability.zookeeper.storageDir: hdfs:///<recovery-folder-path>
        high-availability.zookeeper.path.root: <flink-root-path>
        high-availability.zookeeper.path.namespace: <flink-job-name>

2. The job is cancelled via flink cancel <job-id> command.

   What I've noticed:
        when the job is running the following directory structure is created in 
zookeeper

        /<flink-root-path>/<flink-job-name>/leader/resource_manager_lock
        /<flink-root-path>/<flink-job-name>/leader/rest_server_lock
        /<flink-root-path>/<flink-job-name>/leader/dispatcher_lock
        
/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
        /<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
        /<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
        /<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
        
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
        
/<flink-root-path>/<flink-job-name>/checkpoints/5c21f00b9162becf5ce25a1cf0e67cde/0000000000000000041
        
/<flink-root-path>/<flink-job-name>/checkpoint-counter/5c21f00b9162becf5ce25a1cf0e67cde
        
/<flink-root-path>/<flink-job-name>/running_job_registry/5c21f00b9162becf5ce25a1cf0e67cde


        when the job is cancelled the some ephemeral nodes disappear, but most 
of them are still there:

        
/<flink-root-path>/<flink-job-name>/leader/5c21f00b9162becf5ce25a1cf0e67cde
        /<flink-root-path>/<flink-job-name>/leaderlatch/resource_manager_lock
        /<flink-root-path>/<flink-job-name>/leaderlatch/rest_server_lock
        /<flink-root-path>/<flink-job-name>/leaderlatch/dispatcher_lock
        
/<flink-root-path>/<flink-job-name>/leaderlatch/5c21f00b9162becf5ce25a1cf0e67cde/job_manager_lock
        /<flink-root-path>/<flink-job-name>/checkpoints/
        /<flink-root-path>/<flink-job-name>/checkpoint-counter/
        /<flink-root-path>/<flink-job-name>/running_job_registry/

> Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 

I start the job with Flink-1.6.1


> Was there a failover of Job Master while running before the cancelation?

no there was no failover, as the job is deployed as a YARN cluster,  (YARN 
Cluster High Availability guide states that no failover is required)

> What version of Zookeeper do you use?

Zookeer-3.4.10

> In general, it should not be the case and all job related data should be 
> cleaned from Zookeeper upon cancellation.

as far as I understood the issue concerns a JobManager failover process and my 
question is about a manual intended cancellation of a job.

Here is the method [1] responsible for cleaning zookeeper folders up [1] which 
is called when the job manager has stopped [2]. 
And it seems it only cleans up the folder running_job_registry, other folders 
stay untouched. I supposed that everything under the 
/<flink-root-path>/<flink-job-name>/ folder is cleaned up when the job is 
cancelled.


[1] 
https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/zookeeper/ZooKeeperRunningJobsRegistry.java#L107
[2] 
https://github.com/apache/flink/blob/f087f57749004790b6f5b823d66822c36ae09927/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L332
 

Kind Regards,
Mike Pryakhin

> On 26 Oct 2018, at 12:39, Andrey Zagrebin <and...@data-artisans.com> wrote:
> 
> Hi Mike,
> 
> What was the full job life cycle? 
> Did you start it with Flink 1.6.1 or canceled job running with 1.6.0? 
> Was there a failover of Job Master while running before the cancelation?
> What version of Zookeeper do you use?
> 
> Flink creates child nodes to create a lock for the job in Zookeeper.
> Lock is removed by removing child node (ephemeral).
> Persistent node can be a problem because if job dies and does not remove it, 
> persistent node will not timeout and disappear as ephemeral one 
> and the next job instance will not delete it because it is supposed to be 
> locked by the previous.
> 
> There was a recent fix in 1.6.1 where the job data was not properly deleted 
> from Zookeeper [1].
> In general, it should not be the case and all job related data should be 
> cleaned from Zookeeper upon cancellation.
> 
> Best,
> Andrey
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10011 
> <https://issues.apache.org/jira/browse/FLINK-10011>
> 
>> On 25 Oct 2018, at 15:30, Mikhail Pryakhin <m.prya...@gmail.com 
>> <mailto:m.prya...@gmail.com>> wrote:
>> 
>> Hi Flink experts!
>> 
>> When a streaming job with Zookeeper-HA enabled gets cancelled all the 
>> job-related Zookeeper nodes are not removed. Is there a reason behind that? 
>> I noticed that Zookeeper paths are created of type "Container Node" (an 
>> Ephemeral node that can have nested nodes) and fall back to Persistent node 
>> type in case Zookeeper doesn't support this sort of nodes. 
>> But anyway, it is worth removing the job Zookeeper node when a job is 
>> cancelled, isn't it?
>> 
>> Thank you in advance!
>> 
>> Kind Regards,
>> Mike Pryakhin
>> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: Flink-1.6.1 :: HighAvailability :: ZooKeeperRunningJobsRegistry

Reply via email to