Hi Matthias, Thanks for the explanation. I was trying to understand the concrete user-facing benefits of preserving the flexibility of per-component leader election. Now I get that maybe they want to scale those components independently, and maybe run the UI in an environment that is more accessible than the other processes.
I replied to Chesnay's email regarding whether it is worthwhile to keep the existing interface for those potential but not-yet-realized benefits. Thanks, Dong On Fri, Dec 9, 2022 at 5:47 PM Matthias Pohl <matthias.p...@aiven.io.invalid> wrote: > Hi Dong, > see my answers below. > > Regarding "Interface change might affect other projects that customize HA > > services", are you referring to those projects which hack into Flink's > > source code (as opposed to using Flink's public API) to customize HA > > services? > > > Yes, the proposed change might affect projects that need to have their own > HA implementation for whatever reason (interface change) or if a project > accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap > (change about how the data is stored in the HA backend). The latter one was > actually already the case with the change introduced by FLINK-24038 [1]. > > By the way, since Flink already supports zookeeper and kubernetes as the > > high availability services, are you aware of many projects that still > need > > to hack into Flink's code to customize high availability services? > > > I am aware of projects that use customized HA. But based on our experience > in FLINK-24038 [1] no one complained. So, making people aware through the > mailing list might be good enough. > > And regarding "We lose some flexibility in terms of per-component > > LeaderElection", could you explain what flexibility we need so that we > can > > gauge the associated downside of losing the flexibility? > > > Just to recap: The current interface allows having per-component > LeaderElection (e.g. the ResourceManager leader can run on a different > JobManager than the Dispatcher). This implementation was replaced by > FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does > LeaderElection per process (e.g. ResourceManager and Dispatcher always run > on the same JobManager). The changed interface would require us to touch > the interface again if (for whatever reason) we want to reintroduce > per-component leader election in some form. > The interface change is, strictly speaking, not necessary to provide the > new functionality. But I like the idea of certain requirements (currently, > we need per-process leader election to fix what was reported in FLINK-24038 > [1]) being reflected in the interface. This makes sure that we don't > introduce a per-component leader election again accidentally in the future > because we thought it's a good idea but forgot about FLINK-24038. > > Matthias > > [1] https://issues.apache.org/jira/browse/FLINK-24038 > [2] https://issues.apache.org/jira/browse/FLINK-25806 > > On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <lindon...@gmail.com> wrote: > > > Hi Matthias, > > > > Thanks for the proposal! Overall I am in favor of making this interface > > change to make Flink's codebase more maintainable. > > > > Regarding "Interface change might affect other projects that customize HA > > services", are you referring to those projects which hack into Flink's > > source code (as opposed to using Flink's public API) to customize HA > > services? If yes, it seems OK to break those projects since we don't have > > any backward compatibility guarantee for those projects. > > > > By the way, since Flink already supports zookeeper and kubernetes as the > > high availability services, are you aware of many projects that still > need > > to hack into Flink's code to customize high availability services? > > > > And regarding "We lose some flexibility in terms of per-component > > LeaderElection", could you explain what flexibility we need so that we > can > > gauge the associated downside of losing the flexibility? > > > > Thanks! > > Dong > > > > > > > > On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.p...@aiven.io > > .invalid> > > wrote: > > > > > Hi everyone, > > > > > > The Flink community introduced a new way how leader election works in > > Flink > > > 1.15 with FLINK-24038 [1]. Instead of a per-component leader election, > > all > > > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster) > > use a > > > single (per-JM-process) leader election instance. It was meant to fix > > some > > > issues with deregistering Flink applications in multi-JM setups [1] and > > > reduce load on the HA backend. Users were able to opt-out and switch > back > > > to the old implementation [2]. > > > > > > The new approach was kind of complicated to implement while still > > > maintaining support for the old implementation through the existing > > > interfaces. With FLINK-25806 [3], the old implementation was removed in > > > Flink 1.16. This enables us to clean things up in the > > > HighAvailabilityServices. > > > > > > The proposed change would mean touching the HighAvailabilityServices > > > interface. Currently, the interface provides factory methods for > > > LeaderElectionServices of the aforementioned components. All of these > > > LeaderElectionServices are internally based on the same LeaderElection > > > instance handled in DefaultMultipleComponentLeaderElectionService. > > > Therefore, we can replace all these factory methods by a single one > which > > > returns a LeaderElectionService instance that’s going to be used by all > > > components. Of course, we could also stick to the old > > > HighAvailabilityServices and return the same LeaderElectionService > > instance > > > through each of the four factory methods (similar to what’s done now > with > > > the MultipleComponentLeaderElectionService). > > > > > > A similar question appears for the corresponding > LeaderRetrievalService: > > We > > > could create a single listener instead of having individual > per-component > > > listeners to reflect the current requirement of having a per-JM-process > > > leader election and align it with the LeaderElectionService approach > (if > > we > > > decide on modifying the HA interface). > > > > > > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are > not > > > considered a public interface. Still, I am aware it might affect users > > > (e.g. if they implemented their own HA services or if the project > > monitors > > > HA information in the HA backend outside of Flink). That’s why I wanted > > to > > > start a discussion here. I’m happy to create a FLIP, if someone thinks > > it’s > > > worth it. The work is going to be covered by FLINK-26522 [4] > > > > > > Pro’s (for changing the interface methods): > > > > > > - > > > > > > It reflects the requirements stated in FLINK-24038 [1] about having > a > > > per-JM-process LeaderElection > > > - > > > > > > It helps reducing the complexity of the JobManager > > > > > > Con’s: > > > > > > - > > > > > > We lose some flexibility in terms of per-component LeaderElection > > > - > > > > > > Interface change might affect other projects that customize HA > > services > > > > > > > > > I’m in favor of reducing the amount of factory methods in > > > HighAvailabilityServices considering that it’s not a public interface. > > I’m > > > looking forward to your opinions. > > > > > > Matthias > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-24038 > > > > > > [2] > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-25806 > > > > > > [4] https://issues.apache.org/jira/browse/FLINK-26522 > > > > > > > > > -- > > > > > > [image: Aiven] > > > > > > Matthias Pohl > > > > > > Software Engineer, Aiven > > > > > > matthias.p...@aiven.io <i...@aiven.io> > > > > > > aiven.io <https://www.aiven.io> | > > > <https://www.facebook.com/aivencloud> < > > > https://www.facebook.com/aivencloud/> > > > <https://www.linkedin.com/company/aiven/> > > > <https://www.linkedin.com/company/aiven> < > > https://twitter.com/aiven_io> > > > <https://twitter.com/aiven_io> > > > > > > Aiven Deutschland GmbH > > > > > > Immanuelkirchstraße 26, 10405 Berlin > > > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > > > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > >