[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

Yangze Guo (Jira) Wed, 06 Sep 2023 23:24:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-33053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762579#comment-17762579
 ]


Yangze Guo commented on FLINK-33053:
------------------------------------

In our test, the log shows the ZooKeeperLeaderRetrievalDriver has been close 
correctly. So, it's likely to be a curator issue.

> Watcher leak in Zookeeper HA mode
> ---------------------------------
>
>                 Key: FLINK-33053
>                 URL: https://issues.apache.org/jira/browse/FLINK-33053
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.17.0, 1.17.1
>            Reporter: Yangze Guo
>            Priority: Critical
>
> We observe a watcher leak in our OLAP stress test when enabling Zookeeper HA 
> mode. TM's watches on the leader of JobMaster has not been stopped after job 
> finished.
> Here is how we re-produce this issue:
>  - Start a session cluster and enable Zookeeper HA mode.
>  - Continuously and concurrently submit short queries, e.g. WordCount to the 
> cluster.
>  - echo -n wchp | nc \{zk host} \{zk port} to get current watches.
> We can see a lot of watches on 
> /flink/\{cluster_name}/leader/\{job_id}/connection_info.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33053) Watcher leak in Zookeeper HA mode

Reply via email to