Hi everyone,

The Flink community introduced a new way how leader election works in Flink
1.15 with FLINK-24038 [1]. Instead of a per-component leader election, all
components (i.e. ResourceManager, Dispatcher, REST server, JobMaster) use a
single (per-JM-process) leader election instance. It was meant to fix some
issues with deregistering Flink applications in multi-JM setups [1] and
reduce load on the HA backend. Users were able to opt-out and switch back
to the old implementation [2].

The new approach was kind of complicated to implement while still
maintaining support for the old implementation through the existing
interfaces. With FLINK-25806 [3], the old implementation was removed in
Flink 1.16. This enables us to clean things up in the
HighAvailabilityServices.

The proposed change would mean touching the HighAvailabilityServices
interface. Currently, the interface provides factory methods for
LeaderElectionServices of the aforementioned components. All of these
LeaderElectionServices are internally based on the same LeaderElection
instance handled in DefaultMultipleComponentLeaderElectionService.
Therefore, we can replace all these factory methods by a single one which
returns a LeaderElectionService instance that’s going to be used by all
components. Of course, we could also stick to the old
HighAvailabilityServices and return the same LeaderElectionService instance
through each of the four factory methods (similar to what’s done now with
the MultipleComponentLeaderElectionService).

A similar question appears for the corresponding LeaderRetrievalService: We
could create a single listener instead of having individual per-component
listeners to reflect the current requirement of having a per-JM-process
leader election and align it with the LeaderElectionService approach (if we
decide on modifying the HA interface).

I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
considered a public interface. Still, I am aware it might affect users
(e.g. if they implemented their own HA services or if the project monitors
HA information in the HA backend outside of Flink). That’s why I wanted to
start a discussion here. I’m happy to create a FLIP, if someone thinks it’s
worth it. The work is going to be covered by FLINK-26522 [4]

Pro’s (for changing the interface methods):

   -

   It reflects the requirements stated in FLINK-24038 [1] about having a
   per-JM-process LeaderElection
   -

   It helps reducing the complexity of the JobManager

Con’s:

   -

   We lose some flexibility in terms of per-component LeaderElection
   -

   Interface change might affect other projects that customize HA services


I’m in favor of reducing the amount of factory methods in
HighAvailabilityServices considering that it’s not a public interface. I’m
looking forward to your opinions.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038

[2]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services

[3] https://issues.apache.org/jira/browse/FLINK-25806

[4] https://issues.apache.org/jira/browse/FLINK-26522


-- 

[image: Aiven]

Matthias Pohl

Software Engineer, Aiven

matthias.p...@aiven.io <i...@aiven.io>

aiven.io <https://www.aiven.io>   |
<https://www.facebook.com/aivencloud> <https://www.facebook.com/aivencloud/>
    <https://www.linkedin.com/company/aiven/>
<https://www.linkedin.com/company/aiven>    <https://twitter.com/aiven_io>
<https://twitter.com/aiven_io>

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

Amtsgericht Charlottenburg, HRB 209739 B

Reply via email to