Thanks for the discussion. I will go ahead then and refactor the code
without touching the HighAvailabilityServices interface. We will keep the
per-component factory methods and return a single leader election instance
for all of them. Other HighAvailabilityServices implementations can then
come up with a more fine-grained per-component leader election if they
want. I will document the problems that arise with such a per-component
implementation according to FLINK-24038 [1].

@Chesnay: I will move the discussion/documentation on how the refactoring
should be done into FLINK-26522 [2]. In the end, it's about modifying the
DefaultLeaderElectionService to support multiple contenders, i.e. merging
the MultiComponentLeaderElectionService interface into the
LeaderElectionService interface. Ownership and lifecycle of the leader
election clients is explicitly moved from the LeaderContenders into the
HighAvailabilityServices. That's already the case with the current
per-process leader election. It's rather about documenting this properly
now. But as said, see FLINK-26522 [2] for further details.

[1] https://issues.apache.org/jira/browse/FLINK-24038
[2] https://issues.apache.org/jira/browse/FLINK-26522

On Sun, Dec 11, 2022 at 5:38 AM Zheng Yu Chen <jam.gz...@gmail.com> wrote:

> thanks to Matthias, I read the previous email here, and I will express my
> own views on some issues
>
> @Matthias
>
> My opinion is that the scheme of high-availability splitting of each
> component should be retained. As mentioned in David, when we need to split
> each component, we need to use each LeaderElectionService. Of course, if it
> is merged in a single JM case I have no opinion on becoming one,
> considering that JM may be able to support horizontal expansion [1] in the
> future, I suggest to keep it
>
> @David:
>
> I agree with your opinion, we should rethink how to split the heavy
> components in JM and support the corresponding high availability, instead
> of simply modifying and integrating directly into a LeaderElectionService
> to return
> If you have more ideas and suggestions for FLIP-257[1], we can move to
> thread 257 for discussion[2]
>
> @Dong:
>
> I think the scheme of high-availability splitting of each component should
> be retained, as David commented
> I have been researching related programs and waiting for more positive
> feedback from the community, because this part of the work is more
> complicated than I imagined, and I am afraid that I cannot complete such a
> large program by myself. That is just a preliminary solution. In fact, I
> have imagined splitting each service and using a separate HA, but as the
> FLIP-257 [1] discussion thread [2] said, this will increase the complexity
> of component communication. If you are FLIP-257[1] has more ideas and
> suggestions, we can move to thread 257 for discussion[2]
>
>
> [1] FLIP-271 : Flink JobManager Process Split:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257%3A+Flink+JobManager+Process+Split
> [2] FLIP-271 Discussion thread:
> https://lists.apache.org/thread/r3fnw13j5h04z87lb34l42nvob4pq2xj
>
> Matthias Pohl <matthias.p...@aiven.io.invalid> 于2022年12月7日周三 16:28写道:
>
> > Hi everyone,
> >
> > The Flink community introduced a new way how leader election works in
> Flink
> > 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> all
> > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> use a
> > single (per-JM-process) leader election instance. It was meant to fix
> some
> > issues with deregistering Flink applications in multi-JM setups [1] and
> > reduce load on the HA backend. Users were able to opt-out and switch back
> > to the old implementation [2].
> >
> > The new approach was kind of complicated to implement while still
> > maintaining support for the old implementation through the existing
> > interfaces. With FLINK-25806 [3], the old implementation was removed in
> > Flink 1.16. This enables us to clean things up in the
> > HighAvailabilityServices.
> >
> > The proposed change would mean touching the HighAvailabilityServices
> > interface. Currently, the interface provides factory methods for
> > LeaderElectionServices of the aforementioned components. All of these
> > LeaderElectionServices are internally based on the same LeaderElection
> > instance handled in DefaultMultipleComponentLeaderElectionService.
> > Therefore, we can replace all these factory methods by a single one which
> > returns a LeaderElectionService instance that’s going to be used by all
> > components. Of course, we could also stick to the old
> > HighAvailabilityServices and return the same LeaderElectionService
> instance
> > through each of the four factory methods (similar to what’s done now with
> > the MultipleComponentLeaderElectionService).
> >
> > A similar question appears for the corresponding LeaderRetrievalService:
> We
> > could create a single listener instead of having individual per-component
> > listeners to reflect the current requirement of having a per-JM-process
> > leader election and align it with the LeaderElectionService approach (if
> we
> > decide on modifying the HA interface).
> >
> > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
> > considered a public interface. Still, I am aware it might affect users
> > (e.g. if they implemented their own HA services or if the project
> monitors
> > HA information in the HA backend outside of Flink). That’s why I wanted
> to
> > start a discussion here. I’m happy to create a FLIP, if someone thinks
> it’s
> > worth it. The work is going to be covered by FLINK-26522 [4]
> >
> > Pro’s (for changing the interface methods):
> >
> >    -
> >
> >    It reflects the requirements stated in FLINK-24038 [1] about having a
> >    per-JM-process LeaderElection
> >    -
> >
> >    It helps reducing the complexity of the JobManager
> >
> > Con’s:
> >
> >    -
> >
> >    We lose some flexibility in terms of per-component LeaderElection
> >    -
> >
> >    Interface change might affect other projects that customize HA
> services
> >
> >
> > I’m in favor of reducing the amount of factory methods in
> > HighAvailabilityServices considering that it’s not a public interface.
> I’m
> > looking forward to your opinions.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-24038
> >
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-25806
> >
> > [4] https://issues.apache.org/jira/browse/FLINK-26522
> >
> >
> > --
> >
> > [image: Aiven]
> >
> > Matthias Pohl
> >
> > Software Engineer, Aiven
> >
> > matthias.p...@aiven.io <i...@aiven.io>
> >
> > aiven.io <https://www.aiven.io>   |
> > <https://www.facebook.com/aivencloud> <
> > https://www.facebook.com/aivencloud/>
> >     <https://www.linkedin.com/company/aiven/>
> > <https://www.linkedin.com/company/aiven>    <
> https://twitter.com/aiven_io>
> > <https://twitter.com/aiven_io>
> >
> > Aiven Deutschland GmbH
> >
> > Immanuelkirchstraße 26, 10405 Berlin
> >
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>
>
> --
> Best
>
> ConradJam
>

Reply via email to