Hi Dong,
see my answers below.

Regarding "Interface change might affect other projects that customize HA
> services", are you referring to those projects which hack into Flink's
> source code (as opposed to using Flink's public API) to customize HA
> services?


Yes, the proposed change might affect projects that need to have their own
HA implementation for whatever reason (interface change) or if a project
accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap
(change about how the data is stored in the HA backend). The latter one was
actually already the case with the change introduced by FLINK-24038 [1].

By the way, since Flink already supports zookeeper and kubernetes as the
> high availability services, are you aware of many projects that still need
> to hack into Flink's code to customize high availability services?


I am aware of projects that use customized HA. But based on our experience
in FLINK-24038 [1] no one complained. So, making people aware through the
mailing list might be good enough.

And regarding "We lose some flexibility in terms of per-component
> LeaderElection", could you explain what flexibility we need so that we can
> gauge the associated downside of losing the flexibility?


Just to recap: The current interface allows having per-component
LeaderElection (e.g. the ResourceManager leader can run on a different
JobManager than the Dispatcher). This implementation was replaced by
FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does
LeaderElection per process (e.g. ResourceManager and Dispatcher always run
on the same JobManager). The changed interface would require us to touch
the interface again if (for whatever reason) we want to reintroduce
per-component leader election in some form.
The interface change is, strictly speaking, not necessary to provide the
new functionality. But I like the idea of certain requirements (currently,
we need per-process leader election to fix what was reported in FLINK-24038
[1]) being reflected in the interface. This makes sure that we don't
introduce a per-component leader election again accidentally in the future
because we thought it's a good idea but forgot about FLINK-24038.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038
[2] https://issues.apache.org/jira/browse/FLINK-25806

On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <lindon...@gmail.com> wrote:

> Hi Matthias,
>
> Thanks for the proposal! Overall I am in favor of making this interface
> change to make Flink's codebase more maintainable.
>
> Regarding "Interface change might affect other projects that customize HA
> services", are you referring to those projects which hack into Flink's
> source code (as opposed to using Flink's public API) to customize HA
> services? If yes, it seems OK to break those projects since we don't have
> any backward compatibility guarantee for those projects.
>
> By the way, since Flink already supports zookeeper and kubernetes as the
> high availability services, are you aware of many projects that still need
> to hack into Flink's code to customize high availability services?
>
> And regarding "We lose some flexibility in terms of per-component
> LeaderElection", could you explain what flexibility we need so that we can
> gauge the associated downside of losing the flexibility?
>
> Thanks!
> Dong
>
>
>
> On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.p...@aiven.io
> .invalid>
> wrote:
>
> > Hi everyone,
> >
> > The Flink community introduced a new way how leader election works in
> Flink
> > 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> all
> > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> use a
> > single (per-JM-process) leader election instance. It was meant to fix
> some
> > issues with deregistering Flink applications in multi-JM setups [1] and
> > reduce load on the HA backend. Users were able to opt-out and switch back
> > to the old implementation [2].
> >
> > The new approach was kind of complicated to implement while still
> > maintaining support for the old implementation through the existing
> > interfaces. With FLINK-25806 [3], the old implementation was removed in
> > Flink 1.16. This enables us to clean things up in the
> > HighAvailabilityServices.
> >
> > The proposed change would mean touching the HighAvailabilityServices
> > interface. Currently, the interface provides factory methods for
> > LeaderElectionServices of the aforementioned components. All of these
> > LeaderElectionServices are internally based on the same LeaderElection
> > instance handled in DefaultMultipleComponentLeaderElectionService.
> > Therefore, we can replace all these factory methods by a single one which
> > returns a LeaderElectionService instance that’s going to be used by all
> > components. Of course, we could also stick to the old
> > HighAvailabilityServices and return the same LeaderElectionService
> instance
> > through each of the four factory methods (similar to what’s done now with
> > the MultipleComponentLeaderElectionService).
> >
> > A similar question appears for the corresponding LeaderRetrievalService:
> We
> > could create a single listener instead of having individual per-component
> > listeners to reflect the current requirement of having a per-JM-process
> > leader election and align it with the LeaderElectionService approach (if
> we
> > decide on modifying the HA interface).
> >
> > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
> > considered a public interface. Still, I am aware it might affect users
> > (e.g. if they implemented their own HA services or if the project
> monitors
> > HA information in the HA backend outside of Flink). That’s why I wanted
> to
> > start a discussion here. I’m happy to create a FLIP, if someone thinks
> it’s
> > worth it. The work is going to be covered by FLINK-26522 [4]
> >
> > Pro’s (for changing the interface methods):
> >
> >    -
> >
> >    It reflects the requirements stated in FLINK-24038 [1] about having a
> >    per-JM-process LeaderElection
> >    -
> >
> >    It helps reducing the complexity of the JobManager
> >
> > Con’s:
> >
> >    -
> >
> >    We lose some flexibility in terms of per-component LeaderElection
> >    -
> >
> >    Interface change might affect other projects that customize HA
> services
> >
> >
> > I’m in favor of reducing the amount of factory methods in
> > HighAvailabilityServices considering that it’s not a public interface.
> I’m
> > looking forward to your opinions.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-24038
> >
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-25806
> >
> > [4] https://issues.apache.org/jira/browse/FLINK-26522
> >
> >
> > --
> >
> > [image: Aiven]
> >
> > Matthias Pohl
> >
> > Software Engineer, Aiven
> >
> > matthias.p...@aiven.io <i...@aiven.io>
> >
> > aiven.io <https://www.aiven.io>   |
> > <https://www.facebook.com/aivencloud> <
> > https://www.facebook.com/aivencloud/>
> >     <https://www.linkedin.com/company/aiven/>
> > <https://www.linkedin.com/company/aiven>    <
> https://twitter.com/aiven_io>
> > <https://twitter.com/aiven_io>
> >
> > Aiven Deutschland GmbH
> >
> > Immanuelkirchstraße 26, 10405 Berlin
> >
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>

Reply via email to