After another round of discussion, I came up with a (hopefully) final
proposal. The previously discussed approach was still not optimal because
the contender ID lived in the LeaderContender even though it is actually
LeaderElectionService-internal knowledge. Fixing that helped fix the
overall architecture. Additionally, it brought back the LeaderElection
interface with slightly different methods.

I updated the "Code Cleanup: Merge MultipleComponentLeaderElectionService
into LeaderElectionService" section and moved the old proposal into the
section for rejected alternatives. Feel free to have another look at the
updated version [1].

Matthias

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box

On Wed, Jan 18, 2023 at 1:40 PM Matthias Pohl <matthias.p...@aiven.io>
wrote:

> Thanks for participating in the discussion, Yang & Chesnay. LeaderElection
> interface extension gave me a headache as well. I added it initially
> because I thought it would be of more value. But essentially, it doesn't
> help but make the code harder to understand (as your questions rightfully
> point out). I agree that the FLIP is good enough without this extension. I
> moved it into the Rejected Alternatives section of the FLIP and would
> propose going ahead without it.
>
> I will answer your questions about the LeaderElection extension, anyway:
>
> BTW, if the *LeaderElectionService#register(return LeaderElection)* and
>> *LeaderElectionService#onGrantLeadership* are guarded by a same lock, then
>> we could ensure that the leaderElection in *LeaderContender* is always
>> non-null when it tries to confirm the leadership. And then we do not need
>> the
>> *LeaderContender#initializeLeaderElection*. Right?
>
>
> No, we still would need LeaderContender#initializeLeaderElection because
> the LeaderElectionService needs to be capable of setting the LeaderElection
> within the LeaderContender before triggering the process for granting the
> leadership. This all needs to happen within the
> LeaderElectionService#register(LeaderContender). It's indepent of the lock.
>
> With the extension, how does the leader contender get access to the
>> LeaderElection? I would've assumed that LEService returns a LeaderElection
>> when register is called, but according to the diagram this method doesn't
>> return anything. Is that what initiateLeaderElection is doing?
>
>
> Correct. My initial plan was to make
> LeaderElectionService#register(LeaderContender) return the LeaderElection
> instance. That method could have been called within the LeaderContender.
> But this approach has the flaw that LeaderContender would be in charge
> within this control flow where, actually, we would want
> LeaderElectionService to be still in charge to trigger the process for
> granting the leadership. This required the
> LeaderContender.initializeLeaderElection(LeaderElection) method to be added
> to enable the LeaderElectionService to do the initialization. I added a
> comment to the corresponding class diagram to make this clearer.
>
> The DefaultLeaderElection will rely on package-private methods of the
>> DLEService to handle confirm/hasLeadership calls?
>
>
> Correct. I added the missing package-private methods to the class diagram
> in the FLIP to clear things up.
>
> On Wed, Jan 18, 2023 at 11:47 AM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> There are a lot of good things in this, and until the Extension bit I'm
>> fully on board.
>>
>> With the extension, how does the leader contender get access to the
>> LeaderElection? I would've assumed that LEService returns a
>> LeaderElection when register is called, but according to the diagram
>> this method doesn't return anything. Is that what initiateLeaderElection
>> is doing?
>>
>> The DefaultLeaderElection will rely on package-private methods of the
>> DLEService to handle confirm/hasLeadership calls?
>>
>> I don't fully understand why LContender#initializeLeaderElection is
>> required.
>>
>> On 05/01/2023 14:49, Matthias Pohl wrote:
>> > Hi everyone,
>> > I brought up FLINK-26522 [1] in the mailing list discussion about
>> > consolidating the HighAvailabilityServices interfaces [2], previously.
>> > There, it was concluded that the community still wants the ability to
>> have
>> > per-component leader election and, therefore, keep the
>> > HighAvailabilityServices interface as is. I went back to work on
>> > FLINK-26522 [1] to figure out how we can simplify the current codebase
>> > keeping the decision in mind.
>> >
>> > I wanted to handle FLINK-26522 [1] as a follow-up cleanup task of
>> > FLINK-24038 [3]. But while working on it, I realized that even
>> FLINK-24038
>> > [3] shouldn't have been handled without a FLIP. The per-process leader
>> > election which was introduced in FLINK-24038 [3] changed the ownership
>> of
>> > certain components. This is actually a change that should have been
>> > discussed in the mailing list and deserved a FLIP. To overcome this
>> > shortcoming of FLINK-24038 [3], I decided to prepare FLIP-285 [4] to
>> > provide proper documentation of what happened in FLINK-24038 and what
>> will
>> > be manifested with resolving its follow-up FLINK-26522 [1].
>> >
>> > Conceptually, this FLIP proposes moving away from Flink's support for
>> > single-contender LeaderElectionServices and introducing multi-contender
>> > support by disconnecting the HA-backend leader election lifecycle from
>> the
>> > LeaderContender's lifecycle. This allows us to provide LeaderElection
>> per
>> > component (as it was requested in [2]) but also enables us to utilize a
>> > single leader election for multiple components/contenders as well
>> without
>> > the complexity of the code that was introduced by FLINK-24038 [3].
>> >
>> > I'm looking forward to your comments.
>> >
>> > Matthias
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-26522
>> > [2] https://lists.apache.org/thread/9oy2ml9s3j1v6r77h31sndhc3gw57cfm
>> > [3] https://issues.apache.org/jira/browse/FLINK-24038
>> > [4]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box
>> >
>>
>>

Reply via email to