Hi Thomas! At the moment not many candidates for the existing options, probably: operator.observer.flink.client.timeout operator.reconciler.flink.cancel.job.timeout
So the question is more regarding features that we are introducing going forward :) Cheers, Gyula On Tue, Apr 5, 2022 at 6:51 AM Thomas Weise <t...@apache.org> wrote: > Hi Gyula, > > Can you please specify which of the current settings you see as > candidates to expose to the user? Force upgrade definitely makes sense > and there will likely be others going forward that govern the upgrade > or shutdown behavior. The main consideration should probably be that > settings that can be controlled through the deployment should not > affect other deployments. > > I would also prefer to add these to flinkConfiguaration vs. > introducing a separate operatorConfiguration. > > Thanks, > Thomas > > > On Sat, Apr 2, 2022 at 12:21 AM Gyula Fóra <gyula.f...@gmail.com> wrote: > > > > That's a very good point Matyas, we cannot risk any interference with > other jobs but I think we don't necessarily have to. > > > > First of all we should only allow users to overwrite selected configs. > For deciding what to allow, we can separate the operator related configs > into 2 main groups: > > > > Group 1: Config options that are specific to the reconciliation logic of > a specific job such as feature flags etc (for example > https://issues.apache.org/jira/browse/FLINK-26926). > > These configs cannot possibly cause interference, they are part of the > natural reconciliation logic. > > > > Group 2: Config options that actually affect the controller scheduling, > memory/cpu requirements. These are the problematic ones as they can > actually break the operator if we are not careful. > > > > For Group 1 there are no safeguards necessary and I would say this is > the primary use-case I wanted to cover with this discussion. > > > > I think Group 2 could also be supported as long as we specifically > validate the values that for example scheduling delays are within > pre-configured bounds. One example would be configuring client timeouts, > there could be special cases where the operator hardcoded timeout is not > good enough, but we also want to set a hard max bound on the configurable > value. > > > > Cheers, > > Gyula > > > > > > > > On Sat, Apr 2, 2022 at 8:57 AM Őrhidi Mátyás <matyas.orh...@gmail.com> > wrote: > >> > >> Thanks Gyula for bringing this topic up! Although the suggestion would > >> indeed simplify the configuration handling I have some concerns about > >> opening the operator configuration for end users in certain cases. In a > >> multitenant scenario for example, how could we protect against one user > >> messing up the configs and potentially distract others? As I see it, the > >> operator acts as the control plane, ideally totally transparent for end > >> users, often behind a rest API. Let me know what you think. > >> > >> Cheers, > >> Matyas > >> > >> On Sat, Apr 2, 2022 at 5:12 AM Yang Wang <danrtsey...@gmail.com> wrote: > >> > >> > I also like the proposal 2. Maybe it could be named with > >> > *KubernetesOperatorConfigOptions*, which just looks like all other > >> > ConfigOption(e.g. *KubernetesConfigOptions, YarnConfigOptions*) in > Flink. > >> > The proposal 2 is more natural and easy to use for Flink users. > >> > > >> > > >> > Best, > >> > Yang > >> > > >> > Gyula Fóra <gyf...@apache.org> 于2022年4月2日周六 02:25写道: > >> > > >> >> Hi Devs! > >> >> > >> >> *Background*: > >> >> With more and more features and options added to the flink kubernetes > >> >> operator it would make sense to not expose everything as first class > >> >> options in the deployment/jobspec (same as we do for flink > configuration > >> >> currently). > >> >> > >> >> Furthermore it would be beneficial if users could control > reconciliation > >> >> specific settings like timeouts, reschedule delays etc on a per > deployment > >> >> basis. > >> >> > >> >> > >> >> *Proposal 1*The more conservative proposal would be to add a new > >> >> *operatorConfiguration* field to the deployment spec that the > operator > >> >> would use during the controller loop (merged with the default > operator > >> >> config). This makes the operator very extensible with new options and > >> >> would > >> >> also allow overrides to the default operator config on a per > deployment > >> >> basis. > >> >> > >> >> > >> >> *Proposal 2*I would actually go one step further and propose that we > >> >> should > >> >> merge *flinkConfiguration* and *operatorConfiguration* -as whether > >> >> something affects the flink job submission/job or the operator > behaviour > >> >> does not really make a difference to the end user. For users the > operator > >> >> is part of flink so having a multiple configuration maps could simply > >> >> cause > >> >> confusion. > >> >> We could simply prefix all operator related configs with > >> >> `kubernetes.operator` to ensure that we do not accidentally conflict > with > >> >> flink native config options. > >> >> If we go this route I would even go as far as to naming it simply > >> >> *configuration* for sake of simplicity. > >> >> > >> >> I personally would go with proposal 2 to make this as simple as > possible > >> >> for the users. > >> >> > >> >> Please let me know what you think! > >> >> Gyula > >> >> > >> > >