[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xiangyu feng updated FLINK-32756: --------------------------------- Description: Currently, every newly built RestClusterClient will create a new ClientHighAvailabilityServices which is both unnecessary and resource consuming. For example, each ZooKeeperClientHAServices contains a ZKClient which holds a connection to ZK server and several related threads. By reusing ClientHighAvailabilityServices across multiple RestClusterClient instances, we can save system resources(threads, connections), connection establish time and leader retrieval time. was: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient and a new ClientHAServices. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have implemented SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. This works well in our production. > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > ----------------------------------------------------------------------------------- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission > Reporter: xiangyu feng > Priority: Major > > Currently, every newly built RestClusterClient will create a new > ClientHighAvailabilityServices which is both unnecessary and resource > consuming. For example, each ZooKeeperClientHAServices contains a ZKClient > which holds a connection to ZK server and several related threads. > By reusing ClientHighAvailabilityServices across multiple RestClusterClient > instances, we can save system resources(threads, connections), connection > establish time and leader retrieval time. -- This message was sent by Atlassian Jira (v8.20.10#820010)