Hi Chesnay, I like the use-cases (e.g. running multiple UIs for load-balancing purposes) mentioned. On the other hand, these are probably not high-priority features, and we don't know when the community will get to implement these features. It seems a bit over-design to add implementation complexity for something that we won't need?
Adding regarding the effort to add back the per-component election capability: given that the implementation already follows per-process election, and given that there will likely be a lot of extra design/implementation/test effort needed to achieve the use-cases described above, maybe the change proposed in this thread won't affect the overall effort much? I am hoping that by making the Flink codebase simpler and more readable, we can increase developer velocity and reduce the time we needed to tackle the bugs such as FLINK-24038. Then we would have more time to actually implement the fancy use-cases described above:) What do you think? On Sat, Dec 10, 2022 at 1:31 AM Chesnay Schepler <ches...@apache.org> wrote: > I generally agree that the internals of the HA services are currently > too complex, but I'm wondering if the proposal doesn't go a bit too far > to resolve those. > Is there maybe some way we can refactor things internally to reduce > complexity while keeping the per-component semantics? > > Ultimately, the per-component leader election gives us the theoretical > ability to split components into separate processes, which is also > something we strive to maintain in other layers like the RPC system. > > That's a powerful property to have, which is also quite difficult to > patch back in once you get rid of it. > Of note, whenever a discussion came up about scalability of the JM > process the first answer has _always_ been "well we could split it up at > one point if it's necessary.". > > > I am curious that there are so many such extreme requirements that we > have to rely on the per-component pattern to achieve them? > > This doesn't necessarily go into the _extreme_ direction. It could be > something as simple as running the UI in an environment that is more > accessible than the other processes, running multiple UIs for > load-balancing purposes without paying the additional memory tax of a > full JM, or the Dispatcher process not running any user-code (== some > isolation between jobs). The original FLIP-6 design had ideas to that end, and they aren't really > bad ideas. We just never executed them. > > users may inadvertently recreate problems similar to FLINK-24038 > > That's certainly a risk, but the per-process leader election was just > one possible solution, that just also had other benefits at the time. > > > > Right now I unfortunately can't provide specific ideas on how we could > clean things up internally; that'd take some time that I won't have > until next year. > > On 09/12/2022 16:41, weijie guo wrote: > > Hi Matthias, > > > > Thanks for the proposal! I am in favor of cleaning up this interface, and > > It seems a bit cumbersome now. Especially, the implementation of > > per-component leader election has been removed from our current code > path. > > > > To be honest, I don't like the per-component approach. I'm even often > asked > > why flink used this way? Of course, I admit that this will make our HA > > service more flexible. But personally, perhaps the per-process solution > is > > more better, at least from the perspective of reducing potential problems > > like FLINK-24038, and it can definitely reduce the complexity of > JobManager. > > > > Regarding "We lose some flexibility in terms of per-component > > LeaderElection ", I am curious that there are so many such extreme > > requirements that we have to rely on the per-component pattern to achieve > > them? If there are, is this requirement really reasonable, and users may > > inadvertently recreate problems similar to FLINK-24038. > > > > Best regards, > > > > Weijie > > > > > > Matthias Pohl <matthias.p...@aiven.io.invalid> 于2022年12月9日周五 17:47写道: > > > >> Hi Dong, > >> see my answers below. > >> > >> Regarding "Interface change might affect other projects that customize > HA > >>> services", are you referring to those projects which hack into Flink's > >>> source code (as opposed to using Flink's public API) to customize HA > >>> services? > >> > >> Yes, the proposed change might affect projects that need to have their > own > >> HA implementation for whatever reason (interface change) or if a project > >> accesses the HA backend to retrieve metadata from the ZK node/k8s > ConfigMap > >> (change about how the data is stored in the HA backend). The latter one > was > >> actually already the case with the change introduced by FLINK-24038 [1]. > >> > >> By the way, since Flink already supports zookeeper and kubernetes as the > >>> high availability services, are you aware of many projects that still > >> need > >>> to hack into Flink's code to customize high availability services? > >> > >> I am aware of projects that use customized HA. But based on our > experience > >> in FLINK-24038 [1] no one complained. So, making people aware through > the > >> mailing list might be good enough. > >> > >> And regarding "We lose some flexibility in terms of per-component > >>> LeaderElection", could you explain what flexibility we need so that we > >> can > >>> gauge the associated downside of losing the flexibility? > >> > >> Just to recap: The current interface allows having per-component > >> LeaderElection (e.g. the ResourceManager leader can run on a different > >> JobManager than the Dispatcher). This implementation was replaced by > >> FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation > does > >> LeaderElection per process (e.g. ResourceManager and Dispatcher always > run > >> on the same JobManager). The changed interface would require us to touch > >> the interface again if (for whatever reason) we want to reintroduce > >> per-component leader election in some form. > >> The interface change is, strictly speaking, not necessary to provide the > >> new functionality. But I like the idea of certain requirements > (currently, > >> we need per-process leader election to fix what was reported in > FLINK-24038 > >> [1]) being reflected in the interface. This makes sure that we don't > >> introduce a per-component leader election again accidentally in the > future > >> because we thought it's a good idea but forgot about FLINK-24038. > >> > >> Matthias > >> > >> [1] https://issues.apache.org/jira/browse/FLINK-24038 > >> [2] https://issues.apache.org/jira/browse/FLINK-25806 > >> > >> On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <lindon...@gmail.com> wrote: > >> > >>> Hi Matthias, > >>> > >>> Thanks for the proposal! Overall I am in favor of making this interface > >>> change to make Flink's codebase more maintainable. > >>> > >>> Regarding "Interface change might affect other projects that customize > HA > >>> services", are you referring to those projects which hack into Flink's > >>> source code (as opposed to using Flink's public API) to customize HA > >>> services? If yes, it seems OK to break those projects since we don't > have > >>> any backward compatibility guarantee for those projects. > >>> > >>> By the way, since Flink already supports zookeeper and kubernetes as > the > >>> high availability services, are you aware of many projects that still > >> need > >>> to hack into Flink's code to customize high availability services? > >>> > >>> And regarding "We lose some flexibility in terms of per-component > >>> LeaderElection", could you explain what flexibility we need so that we > >> can > >>> gauge the associated downside of losing the flexibility? > >>> > >>> Thanks! > >>> Dong > >>> > >>> > >>> > >>> On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.p...@aiven.io > >>> .invalid> > >>> wrote: > >>> > >>>> Hi everyone, > >>>> > >>>> The Flink community introduced a new way how leader election works in > >>> Flink > >>>> 1.15 with FLINK-24038 [1]. Instead of a per-component leader election, > >>> all > >>>> components (i.e. ResourceManager, Dispatcher, REST server, JobMaster) > >>> use a > >>>> single (per-JM-process) leader election instance. It was meant to fix > >>> some > >>>> issues with deregistering Flink applications in multi-JM setups [1] > and > >>>> reduce load on the HA backend. Users were able to opt-out and switch > >> back > >>>> to the old implementation [2]. > >>>> > >>>> The new approach was kind of complicated to implement while still > >>>> maintaining support for the old implementation through the existing > >>>> interfaces. With FLINK-25806 [3], the old implementation was removed > in > >>>> Flink 1.16. This enables us to clean things up in the > >>>> HighAvailabilityServices. > >>>> > >>>> The proposed change would mean touching the HighAvailabilityServices > >>>> interface. Currently, the interface provides factory methods for > >>>> LeaderElectionServices of the aforementioned components. All of these > >>>> LeaderElectionServices are internally based on the same LeaderElection > >>>> instance handled in DefaultMultipleComponentLeaderElectionService. > >>>> Therefore, we can replace all these factory methods by a single one > >> which > >>>> returns a LeaderElectionService instance that’s going to be used by > all > >>>> components. Of course, we could also stick to the old > >>>> HighAvailabilityServices and return the same LeaderElectionService > >>> instance > >>>> through each of the four factory methods (similar to what’s done now > >> with > >>>> the MultipleComponentLeaderElectionService). > >>>> > >>>> A similar question appears for the corresponding > >> LeaderRetrievalService: > >>> We > >>>> could create a single listener instead of having individual > >> per-component > >>>> listeners to reflect the current requirement of having a > per-JM-process > >>>> leader election and align it with the LeaderElectionService approach > >> (if > >>> we > >>>> decide on modifying the HA interface). > >>>> > >>>> I didn’t come up with a dedicated FLIP: HighAvailabilityServices are > >> not > >>>> considered a public interface. Still, I am aware it might affect users > >>>> (e.g. if they implemented their own HA services or if the project > >>> monitors > >>>> HA information in the HA backend outside of Flink). That’s why I > wanted > >>> to > >>>> start a discussion here. I’m happy to create a FLIP, if someone thinks > >>> it’s > >>>> worth it. The work is going to be covered by FLINK-26522 [4] > >>>> > >>>> Pro’s (for changing the interface methods): > >>>> > >>>> - > >>>> > >>>> It reflects the requirements stated in FLINK-24038 [1] about > having > >> a > >>>> per-JM-process LeaderElection > >>>> - > >>>> > >>>> It helps reducing the complexity of the JobManager > >>>> > >>>> Con’s: > >>>> > >>>> - > >>>> > >>>> We lose some flexibility in terms of per-component LeaderElection > >>>> - > >>>> > >>>> Interface change might affect other projects that customize HA > >>> services > >>>> > >>>> I’m in favor of reducing the amount of factory methods in > >>>> HighAvailabilityServices considering that it’s not a public interface. > >>> I’m > >>>> looking forward to your opinions. > >>>> > >>>> Matthias > >>>> > >>>> [1] https://issues.apache.org/jira/browse/FLINK-24038 > >>>> > >>>> [2] > >>>> > >>>> > >> > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services > >>>> [3] https://issues.apache.org/jira/browse/FLINK-25806 > >>>> > >>>> [4] https://issues.apache.org/jira/browse/FLINK-26522 > >>>> > >>>> > >>>> -- > >>>> > >>>> [image: Aiven] > >>>> > >>>> Matthias Pohl > >>>> > >>>> Software Engineer, Aiven > >>>> > >>>> matthias.p...@aiven.io <i...@aiven.io> > >>>> > >>>> aiven.io <https://www.aiven.io> | > >>>> <https://www.facebook.com/aivencloud> < > >>>> https://www.facebook.com/aivencloud/> > >>>> <https://www.linkedin.com/company/aiven/> > >>>> <https://www.linkedin.com/company/aiven> < > >>> https://twitter.com/aiven_io> > >>>> <https://twitter.com/aiven_io> > >>>> > >>>> Aiven Deutschland GmbH > >>>> > >>>> Immanuelkirchstraße 26, 10405 Berlin > >>>> > >>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > >>>> > >>>> Amtsgericht Charlottenburg, HRB 209739 B > >>>> > >