Thanks for participating in the discussion, Yang & Chesnay. LeaderElection
interface extension gave me a headache as well. I added it initially
because I thought it would be of more value. But essentially, it doesn't
help but make the code harder to understand (as your questions rightfully
point out). I agree that the FLIP is good enough without this extension. I
moved it into the Rejected Alternatives section of the FLIP and would
propose going ahead without it.
I will answer your questions about the LeaderElection extension, anyway:
BTW, if the *LeaderElectionService#register(return LeaderElection)* and
*LeaderElectionService#onGrantLeadership* are guarded by a same lock, then
we could ensure that the leaderElection in *LeaderContender* is always
non-null when it tries to confirm the leadership. And then we do not need
the
*LeaderContender#initializeLeaderElection*. Right?
No, we still would need LeaderContender#initializeLeaderElection because
the LeaderElectionService needs to be capable of setting the LeaderElection
within the LeaderContender before triggering the process for granting the
leadership. This all needs to happen within the
LeaderElectionService#register(LeaderContender). It's indepent of the lock.
With the extension, how does the leader contender get access to the
LeaderElection? I would've assumed that LEService returns a LeaderElection
when register is called, but according to the diagram this method doesn't
return anything. Is that what initiateLeaderElection is doing?
Correct. My initial plan was to make
LeaderElectionService#register(LeaderContender) return the LeaderElection
instance. That method could have been called within the LeaderContender.
But this approach has the flaw that LeaderContender would be in charge
within this control flow where, actually, we would want
LeaderElectionService to be still in charge to trigger the process for
granting the leadership. This required the
LeaderContender.initializeLeaderElection(LeaderElection) method to be added
to enable the LeaderElectionService to do the initialization. I added a
comment to the corresponding class diagram to make this clearer.
The DefaultLeaderElection will rely on package-private methods of the
DLEService to handle confirm/hasLeadership calls?
Correct. I added the missing package-private methods to the class diagram
in the FLIP to clear things up.
On Wed, Jan 18, 2023 at 11:47 AM Chesnay Schepler <ches...@apache.org>
wrote:
There are a lot of good things in this, and until the Extension bit I'm
fully on board.
With the extension, how does the leader contender get access to the
LeaderElection? I would've assumed that LEService returns a
LeaderElection when register is called, but according to the diagram
this method doesn't return anything. Is that what initiateLeaderElection
is doing?
The DefaultLeaderElection will rely on package-private methods of the
DLEService to handle confirm/hasLeadership calls?
I don't fully understand why LContender#initializeLeaderElection is
required.
On 05/01/2023 14:49, Matthias Pohl wrote:
Hi everyone,
I brought up FLINK-26522 [1] in the mailing list discussion about
consolidating the HighAvailabilityServices interfaces [2], previously.
There, it was concluded that the community still wants the ability to
have
per-component leader election and, therefore, keep the
HighAvailabilityServices interface as is. I went back to work on
FLINK-26522 [1] to figure out how we can simplify the current codebase
keeping the decision in mind.
I wanted to handle FLINK-26522 [1] as a follow-up cleanup task of
FLINK-24038 [3]. But while working on it, I realized that even
FLINK-24038
[3] shouldn't have been handled without a FLIP. The per-process leader
election which was introduced in FLINK-24038 [3] changed the ownership
of
certain components. This is actually a change that should have been
discussed in the mailing list and deserved a FLIP. To overcome this
shortcoming of FLINK-24038 [3], I decided to prepare FLIP-285 [4] to
provide proper documentation of what happened in FLINK-24038 and what
will
be manifested with resolving its follow-up FLINK-26522 [1].
Conceptually, this FLIP proposes moving away from Flink's support for
single-contender LeaderElectionServices and introducing multi-contender
support by disconnecting the HA-backend leader election lifecycle from
the
LeaderContender's lifecycle. This allows us to provide LeaderElection
per
component (as it was requested in [2]) but also enables us to utilize a
single leader election for multiple components/contenders as well
without
the complexity of the code that was introduced by FLINK-24038 [3].
I'm looking forward to your comments.
Matthias
[1] https://issues.apache.org/jira/browse/FLINK-26522
[2] https://lists.apache.org/thread/9oy2ml9s3j1v6r77h31sndhc3gw57cfm
[3] https://issues.apache.org/jira/browse/FLINK-24038
[4]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box