Thanks for starting the discussion, Ingo! Regarding approach 1:
I like the idea of having a mapping scheme from ConfigOption to env var(s), but I'm concerned about the implications of lazy eval. I think it would be preferable to keep the Configuration object as the source of truth, requiring us to do some form of eager evaluation. Regarding approach 2: I don't think we can assume that we know all config option keys. For instance, I might write a custom high availability service or a custom FileSystem plugin that has it's own config options. It would be a pity (but maybe tolerable) if env var config would only work with Flink's core options. Regarding approach 3: What do you think about a mapping like a) stripping the FLINK_CONFIG_ prefix, b) replacing every _ with a dot, c) replacing every __ with a hyphen, d) lowercasing* everything? Some examples for options that include both dots and hyphens: akka.client-socket-worker-pool.pool-size-factor => FLINK_CONFIG_AKKA_CLIENT__SOCKET__WORKER__POOL_POOL__SIZE__FACTOR high-availability.zookeeper.quorum => FLINK_CONFIG_HIGH__AVAILABILITY_ZOOKEEPER_QUORUM It's not ideal, but easy to understand assuming that dots and hyphens are the only special characters in config keys. Regarding the lower-casing step above: ConfigOption keys seem to be case sensitive internally, but I couldn't find any user-facing documentation for this. There should be no options that depends on this behaviour. So if I'm not overlooking anything, I think it should be fine to make it case insensitive internally when accessing the raw value of a ConfigOption. In addition, I think the FLIP should mention special cases such as env.java.opts that are evaluated in the bash scripts and not in the Java code. Cheers, Ufuk On Thu, Jan 21, 2021, at 8:57 AM, Ingo Bürk wrote: > Hi everyone, > > I've now started a FLIP and am opening this discussion thread. Very much > looking forward to your feedback! > > FLIP: https://cwiki.apache.org/confluence/x/ngtRCg > > The first big point I'd like to discuss is about the mechanism of "when" > the EVs (environment variables) are looked up. I'll give three approaches > here, the first of which is currently in the FLIP but very much open for > change, and of course I'm happy to hear about different ideas entirely as > well. > > 1) Lazy evaluation > > Only look up the EVs when an actual config key is requested from > Configuration(#getRawValue), possibly with the addition of caching it once > it has been looked up. > The main benefit here is that no a-priori knowledge of available keys is > required. The downside is that at no point in time we have complete > knowledge of the configuration. This currently only really affects > Configuration#keySet, but it does impose a limitation on future development > worth considering. It also changes Configuration which is not limited to > the Flink configuration, though this can easily be turned into an optional > feature of Configuration. > > 2) Eager evaluation with full information > > If we centrally collect all possible Flink configuration keys in flink-core > (quite a lot seem to be available already, but not all), we'd have complete > information and could eagerly evaluate the environment, the precedence > rules and populate the Configuration object accordingly. It also contains > the implementation entirely to GlobalConfiguration only. > The downside is, however, that this shifts the design a bit of having to > know possible keys upfront. I'm also not sure how much effort it would be > to collect all information in flink-core, or how "spread" this is across > the codebase. > > 3) Eager evaluation through bijective mapping > > If we deviate from the Spring-style naming of the EVs we could potentially > come up with a scheme that provides a bijection between EVs and config > keys. If keys are further prefixed with something like FLINK_CONFIG_ (or > anything to that extent), we could take all EVs with that prefix, map them > to the corresponding config key name and eagerly populate Configuration. > The main challenge is now defining this bijection, and we'd lose some > "flexibility" in the naming of the EVs, so we'd end up with something like > "$FLINK_CONFIG_s3__access_key", which arguably doesn't look very pretty. > > Happy to hear your thoughts on this! > > > Regards > Ingo >