After another round of discussion, I came up with a (hopefully) final proposal. The previously discussed approach was still not optimal because the contender ID lived in the LeaderContender even though it is actually LeaderElectionService-internal knowledge. Fixing that helped fix the overall architecture. Additionally, it brought back the LeaderElection interface with slightly different methods.
I updated the "Code Cleanup: Merge MultipleComponentLeaderElectionService into LeaderElectionService" section and moved the old proposal into the section for rejected alternatives. Feel free to have another look at the updated version [1]. Matthias [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box On Wed, Jan 18, 2023 at 1:40 PM Matthias Pohl <matthias.p...@aiven.io> wrote: > Thanks for participating in the discussion, Yang & Chesnay. LeaderElection > interface extension gave me a headache as well. I added it initially > because I thought it would be of more value. But essentially, it doesn't > help but make the code harder to understand (as your questions rightfully > point out). I agree that the FLIP is good enough without this extension. I > moved it into the Rejected Alternatives section of the FLIP and would > propose going ahead without it. > > I will answer your questions about the LeaderElection extension, anyway: > > BTW, if the *LeaderElectionService#register(return LeaderElection)* and >> *LeaderElectionService#onGrantLeadership* are guarded by a same lock, then >> we could ensure that the leaderElection in *LeaderContender* is always >> non-null when it tries to confirm the leadership. And then we do not need >> the >> *LeaderContender#initializeLeaderElection*. Right? > > > No, we still would need LeaderContender#initializeLeaderElection because > the LeaderElectionService needs to be capable of setting the LeaderElection > within the LeaderContender before triggering the process for granting the > leadership. This all needs to happen within the > LeaderElectionService#register(LeaderContender). It's indepent of the lock. > > With the extension, how does the leader contender get access to the >> LeaderElection? I would've assumed that LEService returns a LeaderElection >> when register is called, but according to the diagram this method doesn't >> return anything. Is that what initiateLeaderElection is doing? > > > Correct. My initial plan was to make > LeaderElectionService#register(LeaderContender) return the LeaderElection > instance. That method could have been called within the LeaderContender. > But this approach has the flaw that LeaderContender would be in charge > within this control flow where, actually, we would want > LeaderElectionService to be still in charge to trigger the process for > granting the leadership. This required the > LeaderContender.initializeLeaderElection(LeaderElection) method to be added > to enable the LeaderElectionService to do the initialization. I added a > comment to the corresponding class diagram to make this clearer. > > The DefaultLeaderElection will rely on package-private methods of the >> DLEService to handle confirm/hasLeadership calls? > > > Correct. I added the missing package-private methods to the class diagram > in the FLIP to clear things up. > > On Wed, Jan 18, 2023 at 11:47 AM Chesnay Schepler <ches...@apache.org> > wrote: > >> There are a lot of good things in this, and until the Extension bit I'm >> fully on board. >> >> With the extension, how does the leader contender get access to the >> LeaderElection? I would've assumed that LEService returns a >> LeaderElection when register is called, but according to the diagram >> this method doesn't return anything. Is that what initiateLeaderElection >> is doing? >> >> The DefaultLeaderElection will rely on package-private methods of the >> DLEService to handle confirm/hasLeadership calls? >> >> I don't fully understand why LContender#initializeLeaderElection is >> required. >> >> On 05/01/2023 14:49, Matthias Pohl wrote: >> > Hi everyone, >> > I brought up FLINK-26522 [1] in the mailing list discussion about >> > consolidating the HighAvailabilityServices interfaces [2], previously. >> > There, it was concluded that the community still wants the ability to >> have >> > per-component leader election and, therefore, keep the >> > HighAvailabilityServices interface as is. I went back to work on >> > FLINK-26522 [1] to figure out how we can simplify the current codebase >> > keeping the decision in mind. >> > >> > I wanted to handle FLINK-26522 [1] as a follow-up cleanup task of >> > FLINK-24038 [3]. But while working on it, I realized that even >> FLINK-24038 >> > [3] shouldn't have been handled without a FLIP. The per-process leader >> > election which was introduced in FLINK-24038 [3] changed the ownership >> of >> > certain components. This is actually a change that should have been >> > discussed in the mailing list and deserved a FLIP. To overcome this >> > shortcoming of FLINK-24038 [3], I decided to prepare FLIP-285 [4] to >> > provide proper documentation of what happened in FLINK-24038 and what >> will >> > be manifested with resolving its follow-up FLINK-26522 [1]. >> > >> > Conceptually, this FLIP proposes moving away from Flink's support for >> > single-contender LeaderElectionServices and introducing multi-contender >> > support by disconnecting the HA-backend leader election lifecycle from >> the >> > LeaderContender's lifecycle. This allows us to provide LeaderElection >> per >> > component (as it was requested in [2]) but also enables us to utilize a >> > single leader election for multiple components/contenders as well >> without >> > the complexity of the code that was introduced by FLINK-24038 [3]. >> > >> > I'm looking forward to your comments. >> > >> > Matthias >> > >> > [1] https://issues.apache.org/jira/browse/FLINK-26522 >> > [2] https://lists.apache.org/thread/9oy2ml9s3j1v6r77h31sndhc3gw57cfm >> > [3] https://issues.apache.org/jira/browse/FLINK-24038 >> > [4] >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box >> > >> >>