Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Zheng Yu Chen Sat, 10 Dec 2022 20:38:35 -0800

thanks to Matthias, I read the previous email here, and I will express my
own views on some issues


@Matthias

My opinion is that the scheme of high-availability splitting of each
component should be retained. As mentioned in David, when we need to split
each component, we need to use each LeaderElectionService. Of course, if it
is merged in a single JM case I have no opinion on becoming one,
considering that JM may be able to support horizontal expansion [1] in the
future, I suggest to keep it

@David:

I agree with your opinion, we should rethink how to split the heavy
components in JM and support the corresponding high availability, instead
of simply modifying and integrating directly into a LeaderElectionService
to return
If you have more ideas and suggestions for FLIP-257[1], we can move to
thread 257 for discussion[2]

@Dong:

I think the scheme of high-availability splitting of each component should
be retained, as David commented
I have been researching related programs and waiting for more positive
feedback from the community, because this part of the work is more
complicated than I imagined, and I am afraid that I cannot complete such a
large program by myself. That is just a preliminary solution. In fact, I
have imagined splitting each service and using a separate HA, but as the
FLIP-257 [1] discussion thread [2] said, this will increase the complexity
of component communication. If you are FLIP-257[1] has more ideas and
suggestions, we can move to thread 257 for discussion[2]


[1] FLIP-271 : Flink JobManager Process Split:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257%3A+Flink+JobManager+Process+Split
[2] FLIP-271 Discussion thread:
https://lists.apache.org/thread/r3fnw13j5h04z87lb34l42nvob4pq2xj

Matthias Pohl <matthias.p...@aiven.io.invalid> 于2022年12月7日周三 16:28写道：

> Hi everyone,
>
> The Flink community introduced a new way how leader election works in Flink
> 1.15 with FLINK-24038 [1]. Instead of a per-component leader election, all
> components (i.e. ResourceManager, Dispatcher, REST server, JobMaster) use a
> single (per-JM-process) leader election instance. It was meant to fix some
> issues with deregistering Flink applications in multi-JM setups [1] and
> reduce load on the HA backend. Users were able to opt-out and switch back
> to the old implementation [2].
>
> The new approach was kind of complicated to implement while still
> maintaining support for the old implementation through the existing
> interfaces. With FLINK-25806 [3], the old implementation was removed in
> Flink 1.16. This enables us to clean things up in the
> HighAvailabilityServices.
>
> The proposed change would mean touching the HighAvailabilityServices
> interface. Currently, the interface provides factory methods for
> LeaderElectionServices of the aforementioned components. All of these
> LeaderElectionServices are internally based on the same LeaderElection
> instance handled in DefaultMultipleComponentLeaderElectionService.
> Therefore, we can replace all these factory methods by a single one which
> returns a LeaderElectionService instance that’s going to be used by all
> components. Of course, we could also stick to the old
> HighAvailabilityServices and return the same LeaderElectionService instance
> through each of the four factory methods (similar to what’s done now with
> the MultipleComponentLeaderElectionService).
>
> A similar question appears for the corresponding LeaderRetrievalService: We
> could create a single listener instead of having individual per-component
> listeners to reflect the current requirement of having a per-JM-process
> leader election and align it with the LeaderElectionService approach (if we
> decide on modifying the HA interface).
>
> I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
> considered a public interface. Still, I am aware it might affect users
> (e.g. if they implemented their own HA services or if the project monitors
> HA information in the HA backend outside of Flink). That’s why I wanted to
> start a discussion here. I’m happy to create a FLIP, if someone thinks it’s
> worth it. The work is going to be covered by FLINK-26522 [4]
>
> Pro’s (for changing the interface methods):
>
>    -
>
>    It reflects the requirements stated in FLINK-24038 [1] about having a
>    per-JM-process LeaderElection
>    -
>
>    It helps reducing the complexity of the JobManager
>
> Con’s:
>
>    -
>
>    We lose some flexibility in terms of per-component LeaderElection
>    -
>
>    Interface change might affect other projects that customize HA services
>
>
> I’m in favor of reducing the amount of factory methods in
> HighAvailabilityServices considering that it’s not a public interface. I’m
> looking forward to your opinions.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-24038
>
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
>
> [3] https://issues.apache.org/jira/browse/FLINK-25806
>
> [4] https://issues.apache.org/jira/browse/FLINK-26522
>
>
> --
>
> [image: Aiven]
>
> Matthias Pohl
>
> Software Engineer, Aiven
>
> matthias.p...@aiven.io <i...@aiven.io>
>
> aiven.io <https://www.aiven.io>   |
> <https://www.facebook.com/aivencloud> <
> https://www.facebook.com/aivencloud/>
>     <https://www.linkedin.com/company/aiven/>
> <https://www.linkedin.com/company/aiven>    <https://twitter.com/aiven_io>
> <https://twitter.com/aiven_io>
>
> Aiven Deutschland GmbH
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> Amtsgericht Charlottenburg, HRB 209739 B
>


-- 
Best

ConradJam

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Reply via email to