Hi everyone, I've updated the FLIP (https://cwiki.apache.org/confluence/x/ngtRCg) according to these discussions.
Regards Ingo On Thu, Jan 21, 2021 at 11:37 AM Ingo Bürk <i...@ververica.com> wrote: > Hi Ufuk, Till, > > I definitely agree that having the Configuration be (or at least feel) > immutable and complete seems like a better choice, and it is probably worth > the trade-off in EV naming flexibility. Let me reshape the FLIP to propose > something along the lines of solution (3). > > Regarding env.java.opts, what special handling is needed there? AFAICT > only the rejected alternative of substituting values would've had an effect > on this. > > > Regards > Ingo > > On Thu, Jan 21, 2021 at 11:13 AM Ufuk Celebi <u...@apache.org> wrote: > >> Thanks for starting the discussion, Ingo! >> >> Regarding approach 1: >> >> I like the idea of having a mapping scheme from ConfigOption to env >> var(s), but I'm concerned about the implications of lazy eval. I think it >> would be preferable to keep the Configuration object as the source of >> truth, requiring us to do some form of eager evaluation. >> >> Regarding approach 2: >> >> I don't think we can assume that we know all config option keys. For >> instance, I might write a custom high availability service or a custom >> FileSystem plugin that has it's own config options. It would be a pity (but >> maybe tolerable) if env var config would only work with Flink's core >> options. >> >> Regarding approach 3: >> >> What do you think about a mapping like >> a) stripping the FLINK_CONFIG_ prefix, >> b) replacing every _ with a dot, >> c) replacing every __ with a hyphen, >> d) lowercasing* everything? >> >> Some examples for options that include both dots and hyphens: >> >> akka.client-socket-worker-pool.pool-size-factor => >> FLINK_CONFIG_AKKA_CLIENT__SOCKET__WORKER__POOL_POOL__SIZE__FACTOR >> >> high-availability.zookeeper.quorum => >> FLINK_CONFIG_HIGH__AVAILABILITY_ZOOKEEPER_QUORUM >> >> It's not ideal, but easy to understand assuming that dots and hyphens are >> the only special characters in config keys. >> >> Regarding the lower-casing step above: ConfigOption keys seem to be case >> sensitive internally, but I couldn't find any user-facing documentation for >> this. There should be no options that depends on this behaviour. So if I'm >> not overlooking anything, I think it should be fine to make it case >> insensitive internally when accessing the raw value of a ConfigOption. >> >> In addition, I think the FLIP should mention special cases such as >> env.java.opts that are evaluated in the bash scripts and not in the Java >> code. >> >> Cheers, >> >> Ufuk >> >> On Thu, Jan 21, 2021, at 8:57 AM, Ingo Bürk wrote: >> > Hi everyone, >> > >> > I've now started a FLIP and am opening this discussion thread. Very much >> > looking forward to your feedback! >> > >> > FLIP: https://cwiki.apache.org/confluence/x/ngtRCg >> > >> > The first big point I'd like to discuss is about the mechanism of "when" >> > the EVs (environment variables) are looked up. I'll give three >> approaches >> > here, the first of which is currently in the FLIP but very much open for >> > change, and of course I'm happy to hear about different ideas entirely >> as >> > well. >> > >> > 1) Lazy evaluation >> > >> > Only look up the EVs when an actual config key is requested from >> > Configuration(#getRawValue), possibly with the addition of caching it >> once >> > it has been looked up. >> > The main benefit here is that no a-priori knowledge of available keys is >> > required. The downside is that at no point in time we have complete >> > knowledge of the configuration. This currently only really affects >> > Configuration#keySet, but it does impose a limitation on future >> development >> > worth considering. It also changes Configuration which is not limited to >> > the Flink configuration, though this can easily be turned into an >> optional >> > feature of Configuration. >> > >> > 2) Eager evaluation with full information >> > >> > If we centrally collect all possible Flink configuration keys in >> flink-core >> > (quite a lot seem to be available already, but not all), we'd have >> complete >> > information and could eagerly evaluate the environment, the precedence >> > rules and populate the Configuration object accordingly. It also >> contains >> > the implementation entirely to GlobalConfiguration only. >> > The downside is, however, that this shifts the design a bit of having to >> > know possible keys upfront. I'm also not sure how much effort it would >> be >> > to collect all information in flink-core, or how "spread" this is across >> > the codebase. >> > >> > 3) Eager evaluation through bijective mapping >> > >> > If we deviate from the Spring-style naming of the EVs we could >> potentially >> > come up with a scheme that provides a bijection between EVs and config >> > keys. If keys are further prefixed with something like FLINK_CONFIG_ (or >> > anything to that extent), we could take all EVs with that prefix, map >> them >> > to the corresponding config key name and eagerly populate Configuration. >> > The main challenge is now defining this bijection, and we'd lose some >> > "flexibility" in the naming of the EVs, so we'd end up with something >> like >> > "$FLINK_CONFIG_s3__access_key", which arguably doesn't look very pretty. >> > >> > Happy to hear your thoughts on this! >> > >> > >> > Regards >> > Ingo >> > >> >