Thanks for driving this effort, Yangze! The proposal overall LGTM. Other from 
the throughput enhancement in the OLAP scenario, the separation of leader 
election/discovery services and the metadata persistence services will also 
make the HA impl clearer and easier to maintain. Just a minor comment on 
naming: would it better to rename PersistentServices to PersistenceServices, as 
usually we put a noun before Services?

Best,
Zhanghao Chen
________________________________
From: Yangze Guo <karma...@gmail.com>
Sent: Tuesday, December 19, 2023 17:33
To: dev <dev@flink.apache.org>
Subject: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

Hi, there,

We would like to start a discussion thread on "FLIP-403: High
Availability Services for OLAP Scenarios"[1].

Currently, Flink's high availability service consists of two
mechanisms: leader election/retrieval services for JobManager and
persistent services for job metadata. However, these mechanisms are
set up in an "all or nothing" manner. In OLAP scenarios, we typically
only require leader election/retrieval services for JobManager
components since jobs usually do not have a restart strategy.
Additionally, the persistence of job states can negatively impact the
cluster's throughput, especially for short query jobs.

To address these issues, this FLIP proposes splitting the
HighAvailabilityServices into LeaderServices and PersistentServices,
and enable users to independently configure the high availability
strategies specifically related to jobs.

Please find more details in the FLIP wiki document [1]. Looking
forward to your feedback.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-403+High+Availability+Services+for+OLAP+Scenarios

Best,
Yangze Guo

Reply via email to