Re: [DISCUSS] Make Kubernetes Operator config "dynamic" and consider merging with flinkConfiguration

Gyula Fóra Mon, 04 Apr 2022 23:43:59 -0700

Hi Thomas!

At the moment not many candidates for the existing options, probably:
operator.observer.flink.client.timeout
operator.reconciler.flink.cancel.job.timeout


So the question is more regarding features that we are introducing going
forward :)

Cheers,
Gyula

On Tue, Apr 5, 2022 at 6:51 AM Thomas Weise <t...@apache.org> wrote:

> Hi Gyula,
>
> Can you please specify which of the current settings you see as
> candidates to expose to the user? Force upgrade definitely makes sense
> and there will likely be others going forward that govern the upgrade
> or shutdown behavior. The main consideration should probably be that
> settings that can be controlled through the deployment should not
> affect other deployments.
>
> I would also prefer to add these to flinkConfiguaration vs.
> introducing a separate operatorConfiguration.
>
> Thanks,
> Thomas
>
>
> On Sat, Apr 2, 2022 at 12:21 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
> >
> > That's a very good point Matyas, we cannot risk any interference with
> other jobs but I think we don't necessarily have to.
> >
> > First of all we should only allow users to overwrite selected configs.
> For deciding what to allow, we can separate the operator related configs
> into 2 main groups:
> >
> > Group 1: Config options that are specific to the reconciliation logic of
> a specific job such as feature flags etc (for example
> https://issues.apache.org/jira/browse/FLINK-26926).
> > These configs cannot possibly cause interference, they are part of the
> natural reconciliation logic.
> >
> > Group 2: Config options that actually affect the controller scheduling,
> memory/cpu requirements. These are the problematic ones as they can
> actually break the operator if we are not careful.
> >
> > For Group 1 there are no safeguards necessary and I would say this is
> the primary use-case I wanted to cover with this discussion.
> >
> > I think Group 2 could also be supported as long as we specifically
> validate the values that for example scheduling delays are within
> pre-configured bounds. One example would be configuring client timeouts,
> there could be special cases where the operator hardcoded timeout is not
> good enough, but we also want to set a hard max bound on the configurable
> value.
> >
> > Cheers,
> > Gyula
> >
> >
> >
> > On Sat, Apr 2, 2022 at 8:57 AM Őrhidi Mátyás <matyas.orh...@gmail.com>
> wrote:
> >>
> >> Thanks Gyula for bringing this topic up! Although the suggestion would
> >> indeed simplify the configuration handling I have some concerns about
> >> opening the operator configuration for end users in certain cases. In a
> >> multitenant scenario for example, how could we protect against one user
> >> messing up the configs and potentially distract others? As I see it, the
> >> operator acts as the control plane, ideally totally transparent for end
> >> users, often behind a rest API. Let me know what you think.
> >>
> >> Cheers,
> >> Matyas
> >>
> >> On Sat, Apr 2, 2022 at 5:12 AM Yang Wang <danrtsey...@gmail.com> wrote:
> >>
> >> > I also like the proposal 2. Maybe it could be named with
> >> > *KubernetesOperatorConfigOptions*, which just looks like all other
> >> > ConfigOption(e.g. *KubernetesConfigOptions, YarnConfigOptions*) in
> Flink.
> >> > The proposal 2 is more natural and easy to use for Flink users.
> >> >
> >> >
> >> > Best,
> >> > Yang
> >> >
> >> > Gyula Fóra <gyf...@apache.org> 于2022年4月2日周六 02:25写道：
> >> >
> >> >> Hi Devs!
> >> >>
> >> >> *Background*:
> >> >> With more and more features and options added to the flink kubernetes
> >> >> operator it would make sense to not expose everything as first class
> >> >> options in the deployment/jobspec (same as we do for flink
> configuration
> >> >> currently).
> >> >>
> >> >> Furthermore it would be beneficial if users could control
> reconciliation
> >> >> specific settings like timeouts, reschedule delays etc on a per
> deployment
> >> >> basis.
> >> >>
> >> >>
> >> >> *Proposal 1*The more conservative proposal would be to add a new
> >> >> *operatorConfiguration* field to the deployment spec that the
> operator
> >> >> would use during the controller loop (merged with the default
> operator
> >> >> config). This makes the operator very extensible with new options and
> >> >> would
> >> >> also allow overrides to the default operator config on a per
> deployment
> >> >> basis.
> >> >>
> >> >>
> >> >> *Proposal 2*I would actually go one step further and propose that we
> >> >> should
> >> >> merge *flinkConfiguration* and *operatorConfiguration* -as whether
> >> >> something affects the flink job submission/job or the operator
> behaviour
> >> >> does not really make a difference to the end user. For users the
> operator
> >> >> is part of flink so having a multiple configuration maps could simply
> >> >> cause
> >> >> confusion.
> >> >> We could simply prefix all operator related configs with
> >> >> `kubernetes.operator` to ensure that we do not accidentally conflict
> with
> >> >> flink native config options.
> >> >> If we go this route I would even go as far as to naming it simply
> >> >> *configuration* for sake of simplicity.
> >> >>
> >> >> I personally would go with proposal 2 to make this as simple as
> possible
> >> >> for the users.
> >> >>
> >> >> Please let me know what you think!
> >> >> Gyula
> >> >>
> >> >
>

Re: [DISCUSS] Make Kubernetes Operator config "dynamic" and consider merging with flinkConfiguration

Reply via email to