Having the *start()* in *LeaderContender* interface and bringing back the *LeaderElection* with some new methods make sense to me.
I have no more concerns now. > - *LeaderContender*: The LeaderContender is integrated as usual except > that it accesses the LeaderElection instead of the LeaderElectionService. > It's going to call startLeaderElection(LeaderContender) where, previously, > LeaderElectionService.start(LeaderContender) was called. > > nit: we call the *LeaderElection#startLeaderElection()*, not the *LeaderElection#startLeaderElection(LeaderContender)*. Because we have already set the leaderContender in the *LeaderElection#register(LeaderContender)*. Best, Yang Chesnay Schepler <ches...@apache.org> 于2023年1月23日周一 23:16写道: > Thanks for updating the design. From my side this looks good. > > On 18/01/2023 17:59, Matthias Pohl wrote: > > After another round of discussion, I came up with a (hopefully) final > > proposal. The previously discussed approach was still not optimal because > > the contender ID lived in the LeaderContender even though it is actually > > LeaderElectionService-internal knowledge. Fixing that helped fix the > > overall architecture. Additionally, it brought back the LeaderElection > > interface with slightly different methods. > > > > I updated the "Code Cleanup: Merge MultipleComponentLeaderElectionService > > into LeaderElectionService" section and moved the old proposal into the > > section for rejected alternatives. Feel free to have another look at the > > updated version [1]. > > > > Matthias > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box > > > > On Wed, Jan 18, 2023 at 1:40 PM Matthias Pohl <matthias.p...@aiven.io> > > wrote: > > > >> Thanks for participating in the discussion, Yang & Chesnay. > LeaderElection > >> interface extension gave me a headache as well. I added it initially > >> because I thought it would be of more value. But essentially, it doesn't > >> help but make the code harder to understand (as your questions > rightfully > >> point out). I agree that the FLIP is good enough without this > extension. I > >> moved it into the Rejected Alternatives section of the FLIP and would > >> propose going ahead without it. > >> > >> I will answer your questions about the LeaderElection extension, anyway: > >> > >> BTW, if the *LeaderElectionService#register(return LeaderElection)* and > >>> *LeaderElectionService#onGrantLeadership* are guarded by a same lock, > then > >>> we could ensure that the leaderElection in *LeaderContender* is always > >>> non-null when it tries to confirm the leadership. And then we do not > need > >>> the > >>> *LeaderContender#initializeLeaderElection*. Right? > >> > >> No, we still would need LeaderContender#initializeLeaderElection because > >> the LeaderElectionService needs to be capable of setting the > LeaderElection > >> within the LeaderContender before triggering the process for granting > the > >> leadership. This all needs to happen within the > >> LeaderElectionService#register(LeaderContender). It's indepent of the > lock. > >> > >> With the extension, how does the leader contender get access to the > >>> LeaderElection? I would've assumed that LEService returns a > LeaderElection > >>> when register is called, but according to the diagram this method > doesn't > >>> return anything. Is that what initiateLeaderElection is doing? > >> > >> Correct. My initial plan was to make > >> LeaderElectionService#register(LeaderContender) return the > LeaderElection > >> instance. That method could have been called within the LeaderContender. > >> But this approach has the flaw that LeaderContender would be in charge > >> within this control flow where, actually, we would want > >> LeaderElectionService to be still in charge to trigger the process for > >> granting the leadership. This required the > >> LeaderContender.initializeLeaderElection(LeaderElection) method to be > added > >> to enable the LeaderElectionService to do the initialization. I added a > >> comment to the corresponding class diagram to make this clearer. > >> > >> The DefaultLeaderElection will rely on package-private methods of the > >>> DLEService to handle confirm/hasLeadership calls? > >> > >> Correct. I added the missing package-private methods to the class > diagram > >> in the FLIP to clear things up. > >> > >> On Wed, Jan 18, 2023 at 11:47 AM Chesnay Schepler <ches...@apache.org> > >> wrote: > >> > >>> There are a lot of good things in this, and until the Extension bit I'm > >>> fully on board. > >>> > >>> With the extension, how does the leader contender get access to the > >>> LeaderElection? I would've assumed that LEService returns a > >>> LeaderElection when register is called, but according to the diagram > >>> this method doesn't return anything. Is that what > initiateLeaderElection > >>> is doing? > >>> > >>> The DefaultLeaderElection will rely on package-private methods of the > >>> DLEService to handle confirm/hasLeadership calls? > >>> > >>> I don't fully understand why LContender#initializeLeaderElection is > >>> required. > >>> > >>> On 05/01/2023 14:49, Matthias Pohl wrote: > >>>> Hi everyone, > >>>> I brought up FLINK-26522 [1] in the mailing list discussion about > >>>> consolidating the HighAvailabilityServices interfaces [2], previously. > >>>> There, it was concluded that the community still wants the ability to > >>> have > >>>> per-component leader election and, therefore, keep the > >>>> HighAvailabilityServices interface as is. I went back to work on > >>>> FLINK-26522 [1] to figure out how we can simplify the current codebase > >>>> keeping the decision in mind. > >>>> > >>>> I wanted to handle FLINK-26522 [1] as a follow-up cleanup task of > >>>> FLINK-24038 [3]. But while working on it, I realized that even > >>> FLINK-24038 > >>>> [3] shouldn't have been handled without a FLIP. The per-process leader > >>>> election which was introduced in FLINK-24038 [3] changed the ownership > >>> of > >>>> certain components. This is actually a change that should have been > >>>> discussed in the mailing list and deserved a FLIP. To overcome this > >>>> shortcoming of FLINK-24038 [3], I decided to prepare FLIP-285 [4] to > >>>> provide proper documentation of what happened in FLINK-24038 and what > >>> will > >>>> be manifested with resolving its follow-up FLINK-26522 [1]. > >>>> > >>>> Conceptually, this FLIP proposes moving away from Flink's support for > >>>> single-contender LeaderElectionServices and introducing > multi-contender > >>>> support by disconnecting the HA-backend leader election lifecycle from > >>> the > >>>> LeaderContender's lifecycle. This allows us to provide LeaderElection > >>> per > >>>> component (as it was requested in [2]) but also enables us to utilize > a > >>>> single leader election for multiple components/contenders as well > >>> without > >>>> the complexity of the code that was introduced by FLINK-24038 [3]. > >>>> > >>>> I'm looking forward to your comments. > >>>> > >>>> Matthias > >>>> > >>>> [1] https://issues.apache.org/jira/browse/FLINK-26522 > >>>> [2] https://lists.apache.org/thread/9oy2ml9s3j1v6r77h31sndhc3gw57cfm > >>>> [3] https://issues.apache.org/jira/browse/FLINK-24038 > >>>> [4] > >>>> > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box > >>> > >