I generally agree that the internals of the HA services are currently too complex, but I'm wondering if the proposal doesn't go a bit too far to resolve those. Is there maybe some way we can refactor things internally to reduce complexity while keeping the per-component semantics?

Ultimately, the per-component leader election gives us the theoretical ability to split components into separate processes, which is also something we strive to maintain in other layers like the RPC system.

That's a powerful property to have, which is also quite difficult to patch back in once you get rid of it.

Of note, whenever a discussion came up about scalability of the JM process the first answer has _always_ been "well we could split it up at one point if it's necessary.".

> I am curious that there are so many such extreme requirements that we have to rely on the per-component pattern to achieve them?

This doesn't necessarily go into the _extreme_ direction. It could be something as simple as running the UI in an environment that is more accessible than the other processes, running multiple UIs for load-balancing purposes without paying the additional memory tax of a full JM, or the Dispatcher process not running any user-code (== some isolation between jobs). The original FLIP-6 design had ideas to that end, and they aren't really bad ideas. We just never executed them.

> users may inadvertently recreate problems similar to FLINK-24038

That's certainly a risk, but the per-process leader election was just one possible solution, that just also had other benefits at the time.



Right now I unfortunately can't provide specific ideas on how we could clean things up internally; that'd take some time that I won't have until next year.

On 09/12/2022 16:41, weijie guo wrote:
Hi Matthias,

Thanks for the proposal! I am in favor of cleaning up this interface, and
It seems a bit cumbersome now. Especially, the implementation of
per-component leader election has been removed from our current code path.

To be honest, I don't like the per-component approach. I'm even often asked
why flink used this way? Of course, I admit that this will make our HA
service more flexible. But personally, perhaps the per-process solution is
more better, at least from the perspective of reducing potential problems
like FLINK-24038, and it can definitely reduce the complexity of JobManager.

Regarding "We lose some flexibility in terms of per-component
LeaderElection ", I am curious that there are so many such extreme
requirements that we have to rely on the per-component pattern to achieve
them? If there are, is this requirement really reasonable, and users may
inadvertently recreate problems similar to FLINK-24038.

Best regards,

Weijie


Matthias Pohl <matthias.p...@aiven.io.invalid> 于2022年12月9日周五 17:47写道:

Hi Dong,
see my answers below.

Regarding "Interface change might affect other projects that customize HA
services", are you referring to those projects which hack into Flink's
source code (as opposed to using Flink's public API) to customize HA
services?

Yes, the proposed change might affect projects that need to have their own
HA implementation for whatever reason (interface change) or if a project
accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap
(change about how the data is stored in the HA backend). The latter one was
actually already the case with the change introduced by FLINK-24038 [1].

By the way, since Flink already supports zookeeper and kubernetes as the
high availability services, are you aware of many projects that still
need
to hack into Flink's code to customize high availability services?

I am aware of projects that use customized HA. But based on our experience
in FLINK-24038 [1] no one complained. So, making people aware through the
mailing list might be good enough.

And regarding "We lose some flexibility in terms of per-component
LeaderElection", could you explain what flexibility we need so that we
can
gauge the associated downside of losing the flexibility?

Just to recap: The current interface allows having per-component
LeaderElection (e.g. the ResourceManager leader can run on a different
JobManager than the Dispatcher). This implementation was replaced by
FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does
LeaderElection per process (e.g. ResourceManager and Dispatcher always run
on the same JobManager). The changed interface would require us to touch
the interface again if (for whatever reason) we want to reintroduce
per-component leader election in some form.
The interface change is, strictly speaking, not necessary to provide the
new functionality. But I like the idea of certain requirements (currently,
we need per-process leader election to fix what was reported in FLINK-24038
[1]) being reflected in the interface. This makes sure that we don't
introduce a per-component leader election again accidentally in the future
because we thought it's a good idea but forgot about FLINK-24038.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038
[2] https://issues.apache.org/jira/browse/FLINK-25806

On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <lindon...@gmail.com> wrote:

Hi Matthias,

Thanks for the proposal! Overall I am in favor of making this interface
change to make Flink's codebase more maintainable.

Regarding "Interface change might affect other projects that customize HA
services", are you referring to those projects which hack into Flink's
source code (as opposed to using Flink's public API) to customize HA
services? If yes, it seems OK to break those projects since we don't have
any backward compatibility guarantee for those projects.

By the way, since Flink already supports zookeeper and kubernetes as the
high availability services, are you aware of many projects that still
need
to hack into Flink's code to customize high availability services?

And regarding "We lose some flexibility in terms of per-component
LeaderElection", could you explain what flexibility we need so that we
can
gauge the associated downside of losing the flexibility?

Thanks!
Dong



On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.p...@aiven.io
.invalid>
wrote:

Hi everyone,

The Flink community introduced a new way how leader election works in
Flink
1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
all
components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
use a
single (per-JM-process) leader election instance. It was meant to fix
some
issues with deregistering Flink applications in multi-JM setups [1] and
reduce load on the HA backend. Users were able to opt-out and switch
back
to the old implementation [2].

The new approach was kind of complicated to implement while still
maintaining support for the old implementation through the existing
interfaces. With FLINK-25806 [3], the old implementation was removed in
Flink 1.16. This enables us to clean things up in the
HighAvailabilityServices.

The proposed change would mean touching the HighAvailabilityServices
interface. Currently, the interface provides factory methods for
LeaderElectionServices of the aforementioned components. All of these
LeaderElectionServices are internally based on the same LeaderElection
instance handled in DefaultMultipleComponentLeaderElectionService.
Therefore, we can replace all these factory methods by a single one
which
returns a LeaderElectionService instance that’s going to be used by all
components. Of course, we could also stick to the old
HighAvailabilityServices and return the same LeaderElectionService
instance
through each of the four factory methods (similar to what’s done now
with
the MultipleComponentLeaderElectionService).

A similar question appears for the corresponding
LeaderRetrievalService:
We
could create a single listener instead of having individual
per-component
listeners to reflect the current requirement of having a per-JM-process
leader election and align it with the LeaderElectionService approach
(if
we
decide on modifying the HA interface).

I didn’t come up with a dedicated FLIP: HighAvailabilityServices are
not
considered a public interface. Still, I am aware it might affect users
(e.g. if they implemented their own HA services or if the project
monitors
HA information in the HA backend outside of Flink). That’s why I wanted
to
start a discussion here. I’m happy to create a FLIP, if someone thinks
it’s
worth it. The work is going to be covered by FLINK-26522 [4]

Pro’s (for changing the interface methods):

    -

    It reflects the requirements stated in FLINK-24038 [1] about having
a
    per-JM-process LeaderElection
    -

    It helps reducing the complexity of the JobManager

Con’s:

    -

    We lose some flexibility in terms of per-component LeaderElection
    -

    Interface change might affect other projects that customize HA
services

I’m in favor of reducing the amount of factory methods in
HighAvailabilityServices considering that it’s not a public interface.
I’m
looking forward to your opinions.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038

[2]


https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
[3] https://issues.apache.org/jira/browse/FLINK-25806

[4] https://issues.apache.org/jira/browse/FLINK-26522


--

[image: Aiven]

Matthias Pohl

Software Engineer, Aiven

matthias.p...@aiven.io <i...@aiven.io>

aiven.io <https://www.aiven.io>   |
<https://www.facebook.com/aivencloud> <
https://www.facebook.com/aivencloud/>
     <https://www.linkedin.com/company/aiven/>
<https://www.linkedin.com/company/aiven>    <
https://twitter.com/aiven_io>
<https://twitter.com/aiven_io>

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

Amtsgericht Charlottenburg, HRB 209739 B


Reply via email to