Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Chesnay Schepler Fri, 09 Dec 2022 09:29:49 -0800

I generally agree that the internals of the HA services are currentlytoo complex, but I'm wondering if the proposal doesn't go a bit too farto resolve those.Is there maybe some way we can refactor things internally to reducecomplexity while keeping the per-component semantics?

Ultimately, the per-component leader election gives us the theoreticalability to split components into separate processes, which is alsosomething we strive to maintain in other layers like the RPC system.

That's a powerful property to have, which is also quite difficult topatch back in once you get rid of it.

Of note, whenever a discussion came up about scalability of the JMprocess the first answer has _always_ been "well we could split it up atone point if it's necessary.".

> I am curious that there are so many such extreme requirements that wehave to rely on the per-component pattern to achieve them?

This doesn't necessarily go into the _extreme_ direction. It could besomething as simple as running the UI in an environment that is moreaccessible than the other processes, running multiple UIs forload-balancing purposes without paying the additional memory tax of afull JM, or the Dispatcher process not running any user-code (== someisolation between jobs).The original FLIP-6 design had ideas to that end, and they aren't reallybad ideas. We just never executed them.


> users may inadvertently recreate problems similar to FLINK-24038

That's certainly a risk, but the per-process leader election was justone possible solution, that just also had other benefits at the time.

Right now I unfortunately can't provide specific ideas on how we couldclean things up internally; that'd take some time that I won't haveuntil next year.


On 09/12/2022 16:41, weijie guo wrote:

Hi Matthias,

Thanks for the proposal! I am in favor of cleaning up this interface, and
It seems a bit cumbersome now. Especially, the implementation of
per-component leader election has been removed from our current code path.

To be honest, I don't like the per-component approach. I'm even often asked
why flink used this way? Of course, I admit that this will make our HA
service more flexible. But personally, perhaps the per-process solution is
more better, at least from the perspective of reducing potential problems
like FLINK-24038, and it can definitely reduce the complexity of JobManager.

Regarding "We lose some flexibility in terms of per-component
LeaderElection ", I am curious that there are so many such extreme
requirements that we have to rely on the per-component pattern to achieve
them? If there are, is this requirement really reasonable, and users may
inadvertently recreate problems similar to FLINK-24038.

Best regards,

Weijie


Matthias Pohl <[email protected]> 于2022年12月9日周五 17:47写道：

Hi Dong,
see my answers below.

Regarding "Interface change might affect other projects that customize HA

services", are you referring to those projects which hack into Flink's
source code (as opposed to using Flink's public API) to customize HA
services?


Yes, the proposed change might affect projects that need to have their own
HA implementation for whatever reason (interface change) or if a project
accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap
(change about how the data is stored in the HA backend). The latter one was
actually already the case with the change introduced by FLINK-24038 [1].

By the way, since Flink already supports zookeeper and kubernetes as the

high availability services, are you aware of many projects that still

need

to hack into Flink's code to customize high availability services?


I am aware of projects that use customized HA. But based on our experience
in FLINK-24038 [1] no one complained. So, making people aware through the
mailing list might be good enough.

And regarding "We lose some flexibility in terms of per-component

LeaderElection", could you explain what flexibility we need so that we

can

gauge the associated downside of losing the flexibility?


Just to recap: The current interface allows having per-component
LeaderElection (e.g. the ResourceManager leader can run on a different
JobManager than the Dispatcher). This implementation was replaced by
FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does
LeaderElection per process (e.g. ResourceManager and Dispatcher always run
on the same JobManager). The changed interface would require us to touch
the interface again if (for whatever reason) we want to reintroduce
per-component leader election in some form.
The interface change is, strictly speaking, not necessary to provide the
new functionality. But I like the idea of certain requirements (currently,
we need per-process leader election to fix what was reported in FLINK-24038
[1]) being reflected in the interface. This makes sure that we don't
introduce a per-component leader election again accidentally in the future
because we thought it's a good idea but forgot about FLINK-24038.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038
[2] https://issues.apache.org/jira/browse/FLINK-25806

On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <[email protected]> wrote:

Hi Matthias,

Thanks for the proposal! Overall I am in favor of making this interface
change to make Flink's codebase more maintainable.

Regarding "Interface change might affect other projects that customize HA
services", are you referring to those projects which hack into Flink's
source code (as opposed to using Flink's public API) to customize HA
services? If yes, it seems OK to break those projects since we don't have
any backward compatibility guarantee for those projects.

By the way, since Flink already supports zookeeper and kubernetes as the
high availability services, are you aware of many projects that still

need

to hack into Flink's code to customize high availability services?

And regarding "We lose some flexibility in terms of per-component
LeaderElection", could you explain what flexibility we need so that we

can

gauge the associated downside of losing the flexibility?

Thanks!
Dong



On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <[email protected]
.invalid>
wrote:

Hi everyone,

The Flink community introduced a new way how leader election works in

Flink

1.15 with FLINK-24038 [1]. Instead of a per-component leader election,

all

components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)

use a

single (per-JM-process) leader election instance. It was meant to fix

some

issues with deregistering Flink applications in multi-JM setups [1] and
reduce load on the HA backend. Users were able to opt-out and switch

back

to the old implementation [2].

The new approach was kind of complicated to implement while still
maintaining support for the old implementation through the existing
interfaces. With FLINK-25806 [3], the old implementation was removed in
Flink 1.16. This enables us to clean things up in the
HighAvailabilityServices.

The proposed change would mean touching the HighAvailabilityServices
interface. Currently, the interface provides factory methods for
LeaderElectionServices of the aforementioned components. All of these
LeaderElectionServices are internally based on the same LeaderElection
instance handled in DefaultMultipleComponentLeaderElectionService.
Therefore, we can replace all these factory methods by a single one

which

returns a LeaderElectionService instance that’s going to be used by all
components. Of course, we could also stick to the old
HighAvailabilityServices and return the same LeaderElectionService

instance

through each of the four factory methods (similar to what’s done now

with

the MultipleComponentLeaderElectionService).

A similar question appears for the corresponding

LeaderRetrievalService:

We

could create a single listener instead of having individual

per-component

listeners to reflect the current requirement of having a per-JM-process
leader election and align it with the LeaderElectionService approach

(if

we

decide on modifying the HA interface).

I didn’t come up with a dedicated FLIP: HighAvailabilityServices are

not

considered a public interface. Still, I am aware it might affect users
(e.g. if they implemented their own HA services or if the project

monitors

HA information in the HA backend outside of Flink). That’s why I wanted

to

start a discussion here. I’m happy to create a FLIP, if someone thinks

it’s

worth it. The work is going to be covered by FLINK-26522 [4]

Pro’s (for changing the interface methods):

    -

    It reflects the requirements stated in FLINK-24038 [1] about having

    per-JM-process LeaderElection
    -

    It helps reducing the complexity of the JobManager

Con’s:

    -

    We lose some flexibility in terms of per-component LeaderElection
    -

    Interface change might affect other projects that customize HA

services


I’m in favor of reducing the amount of factory methods in
HighAvailabilityServices considering that it’s not a public interface.

I’m

looking forward to your opinions.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038

[2]

https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services

[3] https://issues.apache.org/jira/browse/FLINK-25806

[4] https://issues.apache.org/jira/browse/FLINK-26522


--

[image: Aiven]

Matthias Pohl

Software Engineer, Aiven

[email protected] <[email protected]>

aiven.io <https://www.aiven.io>   |
<https://www.facebook.com/aivencloud> <
https://www.facebook.com/aivencloud/>
     <https://www.linkedin.com/company/aiven/>
<https://www.linkedin.com/company/aiven>    <

https://twitter.com/aiven_io>

<https://twitter.com/aiven_io>

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

Amtsgericht Charlottenburg, HRB 209739 B

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Reply via email to