Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Dong Lin Fri, 09 Dec 2022 21:32:21 -0800

Hi Chesnay,

I like the use-cases (e.g. running multiple UIs for load-balancing
purposes) mentioned. On the other hand, these are probably not
high-priority features, and we don't know when the community will get to
implement these features. It seems a bit over-design to add implementation
complexity for something that we won't need?


Adding regarding the effort to add back the per-component election
capability: given that the implementation already follows per-process
election, and given that there will likely be a lot of extra
design/implementation/test effort needed to achieve the use-cases described
above, maybe the change proposed in this thread won't affect the overall
effort much?

I am hoping that by making the Flink codebase simpler and more readable, we
can increase developer velocity and reduce the time we needed to tackle the
bugs such as FLINK-24038. Then we would have more time to actually
implement the fancy use-cases described above:) What do you think?


On Sat, Dec 10, 2022 at 1:31 AM Chesnay Schepler <[email protected]> wrote:

> I generally agree that the internals of the HA services are currently
> too complex, but I'm wondering if the proposal doesn't go a bit too far
> to resolve those.
> Is there maybe some way we can refactor things internally to reduce
> complexity while keeping the per-component semantics?
>
> Ultimately, the per-component leader election gives us the theoretical
> ability to split components into separate processes, which is also
> something we strive to maintain in other layers like the RPC system.
>
> That's a powerful property to have, which is also quite difficult to
> patch back in once you get rid of it.


> Of note, whenever a discussion came up about scalability of the JM
> process the first answer has _always_ been "well we could split it up at
> one point if it's necessary.".
>
>  > I am curious that there are so many such extreme requirements that we
> have to rely on the per-component pattern to achieve them?
>
> This doesn't necessarily go into the _extreme_ direction. It could be
> something as simple as running the UI in an environment that is more
> accessible than the other processes, running multiple UIs for
> load-balancing purposes without paying the additional memory tax of a
> full JM, or the Dispatcher process not running any user-code (== some
> isolation between jobs).

The original FLIP-6 design had ideas to that end, and they aren't really
> bad ideas. We just never executed them.


>  > users may inadvertently recreate problems similar to FLINK-24038
>
> That's certainly a risk, but the per-process leader election was just
> one possible solution, that just also had other benefits at the time.
>
>
>
> Right now I unfortunately can't provide specific ideas on how we could
> clean things up internally; that'd take some time that I won't have
> until next year.
>
> On 09/12/2022 16:41, weijie guo wrote:
> > Hi Matthias,
> >
> > Thanks for the proposal! I am in favor of cleaning up this interface, and
> > It seems a bit cumbersome now. Especially, the implementation of
> > per-component leader election has been removed from our current code
> path.
> >
> > To be honest, I don't like the per-component approach. I'm even often
> asked
> > why flink used this way? Of course, I admit that this will make our HA
> > service more flexible. But personally, perhaps the per-process solution
> is
> > more better, at least from the perspective of reducing potential problems
> > like FLINK-24038, and it can definitely reduce the complexity of
> JobManager.
> >
> > Regarding "We lose some flexibility in terms of per-component
> > LeaderElection ", I am curious that there are so many such extreme
> > requirements that we have to rely on the per-component pattern to achieve
> > them? If there are, is this requirement really reasonable, and users may
> > inadvertently recreate problems similar to FLINK-24038.
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Matthias Pohl <[email protected]> 于2022年12月9日周五 17:47写道：
> >
> >> Hi Dong,
> >> see my answers below.
> >>
> >> Regarding "Interface change might affect other projects that customize
> HA
> >>> services", are you referring to those projects which hack into Flink's
> >>> source code (as opposed to using Flink's public API) to customize HA
> >>> services?
> >>
> >> Yes, the proposed change might affect projects that need to have their
> own
> >> HA implementation for whatever reason (interface change) or if a project
> >> accesses the HA backend to retrieve metadata from the ZK node/k8s
> ConfigMap
> >> (change about how the data is stored in the HA backend). The latter one
> was
> >> actually already the case with the change introduced by FLINK-24038 [1].
> >>
> >> By the way, since Flink already supports zookeeper and kubernetes as the
> >>> high availability services, are you aware of many projects that still
> >> need
> >>> to hack into Flink's code to customize high availability services?
> >>
> >> I am aware of projects that use customized HA. But based on our
> experience
> >> in FLINK-24038 [1] no one complained. So, making people aware through
> the
> >> mailing list might be good enough.
> >>
> >> And regarding "We lose some flexibility in terms of per-component
> >>> LeaderElection", could you explain what flexibility we need so that we
> >> can
> >>> gauge the associated downside of losing the flexibility?
> >>
> >> Just to recap: The current interface allows having per-component
> >> LeaderElection (e.g. the ResourceManager leader can run on a different
> >> JobManager than the Dispatcher). This implementation was replaced by
> >> FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation
> does
> >> LeaderElection per process (e.g. ResourceManager and Dispatcher always
> run
> >> on the same JobManager). The changed interface would require us to touch
> >> the interface again if (for whatever reason) we want to reintroduce
> >> per-component leader election in some form.
> >> The interface change is, strictly speaking, not necessary to provide the
> >> new functionality. But I like the idea of certain requirements
> (currently,
> >> we need per-process leader election to fix what was reported in
> FLINK-24038
> >> [1]) being reflected in the interface. This makes sure that we don't
> >> introduce a per-component leader election again accidentally in the
> future
> >> because we thought it's a good idea but forgot about FLINK-24038.
> >>
> >> Matthias
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-24038
> >> [2] https://issues.apache.org/jira/browse/FLINK-25806
> >>
> >> On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <[email protected]> wrote:
> >>
> >>> Hi Matthias,
> >>>
> >>> Thanks for the proposal! Overall I am in favor of making this interface
> >>> change to make Flink's codebase more maintainable.
> >>>
> >>> Regarding "Interface change might affect other projects that customize
> HA
> >>> services", are you referring to those projects which hack into Flink's
> >>> source code (as opposed to using Flink's public API) to customize HA
> >>> services? If yes, it seems OK to break those projects since we don't
> have
> >>> any backward compatibility guarantee for those projects.
> >>>
> >>> By the way, since Flink already supports zookeeper and kubernetes as
> the
> >>> high availability services, are you aware of many projects that still
> >> need
> >>> to hack into Flink's code to customize high availability services?
> >>>
> >>> And regarding "We lose some flexibility in terms of per-component
> >>> LeaderElection", could you explain what flexibility we need so that we
> >> can
> >>> gauge the associated downside of losing the flexibility?
> >>>
> >>> Thanks!
> >>> Dong
> >>>
> >>>
> >>>
> >>> On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <[email protected]
> >>> .invalid>
> >>> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> The Flink community introduced a new way how leader election works in
> >>> Flink
> >>>> 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> >>> all
> >>>> components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> >>> use a
> >>>> single (per-JM-process) leader election instance. It was meant to fix
> >>> some
> >>>> issues with deregistering Flink applications in multi-JM setups [1]
> and
> >>>> reduce load on the HA backend. Users were able to opt-out and switch
> >> back
> >>>> to the old implementation [2].
> >>>>
> >>>> The new approach was kind of complicated to implement while still
> >>>> maintaining support for the old implementation through the existing
> >>>> interfaces. With FLINK-25806 [3], the old implementation was removed
> in
> >>>> Flink 1.16. This enables us to clean things up in the
> >>>> HighAvailabilityServices.
> >>>>
> >>>> The proposed change would mean touching the HighAvailabilityServices
> >>>> interface. Currently, the interface provides factory methods for
> >>>> LeaderElectionServices of the aforementioned components. All of these
> >>>> LeaderElectionServices are internally based on the same LeaderElection
> >>>> instance handled in DefaultMultipleComponentLeaderElectionService.
> >>>> Therefore, we can replace all these factory methods by a single one
> >> which
> >>>> returns a LeaderElectionService instance that’s going to be used by
> all
> >>>> components. Of course, we could also stick to the old
> >>>> HighAvailabilityServices and return the same LeaderElectionService
> >>> instance
> >>>> through each of the four factory methods (similar to what’s done now
> >> with
> >>>> the MultipleComponentLeaderElectionService).
> >>>>
> >>>> A similar question appears for the corresponding
> >> LeaderRetrievalService:
> >>> We
> >>>> could create a single listener instead of having individual
> >> per-component
> >>>> listeners to reflect the current requirement of having a
> per-JM-process
> >>>> leader election and align it with the LeaderElectionService approach
> >> (if
> >>> we
> >>>> decide on modifying the HA interface).
> >>>>
> >>>> I didn’t come up with a dedicated FLIP: HighAvailabilityServices are
> >> not
> >>>> considered a public interface. Still, I am aware it might affect users
> >>>> (e.g. if they implemented their own HA services or if the project
> >>> monitors
> >>>> HA information in the HA backend outside of Flink). That’s why I
> wanted
> >>> to
> >>>> start a discussion here. I’m happy to create a FLIP, if someone thinks
> >>> it’s
> >>>> worth it. The work is going to be covered by FLINK-26522 [4]
> >>>>
> >>>> Pro’s (for changing the interface methods):
> >>>>
> >>>>     -
> >>>>
> >>>>     It reflects the requirements stated in FLINK-24038 [1] about
> having
> >> a
> >>>>     per-JM-process LeaderElection
> >>>>     -
> >>>>
> >>>>     It helps reducing the complexity of the JobManager
> >>>>
> >>>> Con’s:
> >>>>
> >>>>     -
> >>>>
> >>>>     We lose some flexibility in terms of per-component LeaderElection
> >>>>     -
> >>>>
> >>>>     Interface change might affect other projects that customize HA
> >>> services
> >>>>
> >>>> I’m in favor of reducing the amount of factory methods in
> >>>> HighAvailabilityServices considering that it’s not a public interface.
> >>> I’m
> >>>> looking forward to your opinions.
> >>>>
> >>>> Matthias
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/FLINK-24038
> >>>>
> >>>> [2]
> >>>>
> >>>>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> >>>> [3] https://issues.apache.org/jira/browse/FLINK-25806
> >>>>
> >>>> [4] https://issues.apache.org/jira/browse/FLINK-26522
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> [image: Aiven]
> >>>>
> >>>> Matthias Pohl
> >>>>
> >>>> Software Engineer, Aiven
> >>>>
> >>>> [email protected] <[email protected]>
> >>>>
> >>>> aiven.io <https://www.aiven.io>   |
> >>>> <https://www.facebook.com/aivencloud> <
> >>>> https://www.facebook.com/aivencloud/>
> >>>>      <https://www.linkedin.com/company/aiven/>
> >>>> <https://www.linkedin.com/company/aiven>    <
> >>> https://twitter.com/aiven_io>
> >>>> <https://twitter.com/aiven_io>
> >>>>
> >>>> Aiven Deutschland GmbH
> >>>>
> >>>> Immanuelkirchstraße 26, 10405 Berlin
> >>>>
> >>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>>>
> >>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>
>
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Reply via email to