Thanks for starting this discussion Ingo! I like the flexibility in terms of supported EV formats the lazy evaluation gives us. However, I am a bit concerned by the fact that it is quite hard to compute the configurational ground truth w/o knowing all requested config keys. Moreover, it makes the configuration "more" mutable. One pattern which would no longer be possible with this approach is that Flink decides on some configuration values for the user and sets it in the configuration because there might always be an EV which overwrites this value later.
I like to think of the configuration as something immutable (I know this not the case technically speaking) which we load/create at the beginning of a Flink process and then pass to the different components. That way, it is also quite easy to inform the user how the system is configured (e.g. displaying the effective configuration in the web UI). Moreover, there are fewer surprises if there is an exported env variable (e.g. from a previous run) which overwrites a configuration value but has been forgotten about. I don't have a very good idea for a bijective mapping but if we could come up with one, then I'd be in favour of this approach because it allows us to create an "immutable" configuration which is created at the beginning of the process. One idea for option 2) is to look up all ConfigOption instances on the classpath via reflection to create the full set of keys. Cheers, Till On Thu, Jan 21, 2021 at 8:58 AM Ingo Bürk <i...@ververica.com> wrote: > Hi everyone, > > I've now started a FLIP and am opening this discussion thread. Very much > looking forward to your feedback! > > FLIP: https://cwiki.apache.org/confluence/x/ngtRCg > > The first big point I'd like to discuss is about the mechanism of "when" > the EVs (environment variables) are looked up. I'll give three approaches > here, the first of which is currently in the FLIP but very much open for > change, and of course I'm happy to hear about different ideas entirely as > well. > > 1) Lazy evaluation > > Only look up the EVs when an actual config key is requested from > Configuration(#getRawValue), possibly with the addition of caching it once > it has been looked up. > The main benefit here is that no a-priori knowledge of available keys is > required. The downside is that at no point in time we have complete > knowledge of the configuration. This currently only really affects > Configuration#keySet, but it does impose a limitation on future development > worth considering. It also changes Configuration which is not limited to > the Flink configuration, though this can easily be turned into an optional > feature of Configuration. > > 2) Eager evaluation with full information > > If we centrally collect all possible Flink configuration keys in flink-core > (quite a lot seem to be available already, but not all), we'd have complete > information and could eagerly evaluate the environment, the precedence > rules and populate the Configuration object accordingly. It also contains > the implementation entirely to GlobalConfiguration only. > The downside is, however, that this shifts the design a bit of having to > know possible keys upfront. I'm also not sure how much effort it would be > to collect all information in flink-core, or how "spread" this is across > the codebase. > > 3) Eager evaluation through bijective mapping > > If we deviate from the Spring-style naming of the EVs we could potentially > come up with a scheme that provides a bijection between EVs and config > keys. If keys are further prefixed with something like FLINK_CONFIG_ (or > anything to that extent), we could take all EVs with that prefix, map them > to the corresponding config key name and eagerly populate Configuration. > The main challenge is now defining this bijection, and we'd lose some > "flexibility" in the naming of the EVs, so we'd end up with something like > "$FLINK_CONFIG_s3__access_key", which arguably doesn't look very pretty. > > Happy to hear your thoughts on this! > > > Regards > Ingo >