Hi Timo and Dawid, Thanks for the patient reply. I agree that both option a) and option b) can solve the mutability problem.
For option a), is it a little intrusive to add a duplicate() method for a Configurable? It would be great if we don't put this burden on users if possible. For option b), I actually feel it is slightly better than option a) from API perspective as getFactory() seems a more understandable method of a Configurable compared with duplicate(). And users do not need to implement much more logic. I am curious what is the downside of keeping the Configuration simple to only have primitive types, and always create the Configurable using a util method? If Configurables are rare, do we need to put the instantiation / bookkeeping logic in Configuration? @Becket for the toConfiguration this is required for shipping the > Configuration to TaskManager, so that we do not have to use java > serializability. Do you mean a Configurable may be created and configured directly without reading settings from a Configuration instance? I thought a Configurable will always be created via a ConfigurableFactory by extracting required configs from a Configuration. Am I missing something? Thanks, Jiangjie (Becket) Qin On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <dwysakow...@apache.org> wrote: > Hi Timo, Becket > > From the options that Timo suggested for improving the mutability > situation I would prefer option a) as this is the more explicit option > and simpler option. Just as a remark, I think in general Configurable > types for options will be rather very rare for some special use-cases, > as the majority of options are rather simpler parameters of primitive > types (+duration, memory) > > @Becket for the toConfiguration this is required for shipping the > Configuration to TaskManager, so that we do not have to use java > serializability. > > Best, > > Dawid > > > On 02/09/2019 10:05, Timo Walther wrote: > > Hi Becket, > > > > Re 1 & 3: "values in configurations should actually be immutable" > > > > I would also prefer immutability but most of our configuration is > > mutable due to serialization/deserialization. Also maps and list could > > be mutable in theory. It is difficult to really enforce that for > > nested structures. I see two options: > > > > a) For the original design: How about we force implementers to add a > > duplicate() method in a Configurable object? This would mean that the > > object is still mutable but by duplicating the object both during > > reading and writing we would avoid the problem you described. > > > > b) For the current design: We still use the factory approach but let a > > Configurable object implement a getFactory() method such that we know > > how to serialize the object. With the help of a factory we can also > > duplicate the object easily during reading and writing and ensure > > immutability. > > > > I would personally go for approach a) to not over-engineer this topic. > > But I'm open for option b). > > > > Regards, > > Timo > > > > > > On 31.08.19 04:09, Becket Qin wrote: > >> Hi Timo, > >> > >> Thanks for the reply. I am still a little concerned over the > >> mutability of > >> the Configurable which could be the value in Configuration. > >> > >> Re: 1 > >> > >>> But in general, people should not use any internal fields. > >>> Configurable objects are meant for simple little helper POJOs, not > >>> complex arbitrary nested data structures. > >> This seems difficult to enforce... Ideally the values in configurations > >> should actually be immutable. The value can only be changed by > >> explicitly > >> calling setters in Configuration. Otherwise we may have weird situation > >> where the Configurable in the same configuration are different in two > >> places because the configurable is modified in one places and not > >> modified > >> in another place. So I am a little concerned on putting a > >> Configurable type > >> in the Configuration map, because the value could be modified without > >> explicitly setting the configuration. For example, can users do the > >> following? > >> > >> Configurable configurable = > >> configuration.getConfigurable(myConfigurableOption); > >> configurable.setConfigA(123); // this already changes the configurable > >> object in the configuration. > >> > >> Re: 2 > >> Thanks for confirming. As long as users will not have a situation where > >> they need to set two configurations with the same key but different > >> descriptions, I think it is OK. > >> > >> Re: 3 > >> This is actually kind of related to 1. i.e. Whether toConfiguration() > >> guarantees the exact same object can be rebuilt from the > >> configuration or > >> not. I am still not quite sure about the use case of toConfiguration() > >> though. It seems indicating the Configurable is mutable, which might be > >> dangerous. > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <twal...@apache.org> > >> wrote: > >> > >>> Hi Becket, > >>> > >>> 1. First of all, you are totally right. The FLIP contains a bug due to > >>> the last minute changes that Dawid suggested: by having immutable > >>> objects created by a factory we loose the serializability of the > >>> Configuration because the factory itself is not stored in the > >>> Configuration. I would propose to revert the last change and stick to > >>> the original design, which means that a object must implement the > >>> Configurable interface and also implements > >>> serialization/deserialization > >>> methods such that also internal fields can be persisted as you > >>> suggested. But in general, people should not use any internal fields. > >>> Configurable objects are meant for simple little helper POJOs, not > >>> complex arbitrary nested data structures. > >>> > >>> It is Map<String, Object> because Configuration stores the raw objects. > >>> If you put a Boolean option into it, it remains Boolean. This makes the > >>> map very efficient for shipping to the cluster and accessing options > >>> multiple times. The same for configurable objects. We put the pure > >>> objects into the map without any serialization/deserialization. The > >>> provided factory allows to convert the Object into a Configuration and > >>> we know how to serialize/deserializise a configuration because it is > >>> just a key/value map. > >>> > >>> 2. Yes, this is what we had in mind. It should still be the same > >>> configuration option. We would like to avoid specialized option keys > >>> across components (exec.max-para and table.exec.max-para) if they are > >>> describing basically the same thing. But adding some more description > >>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" > >>> does not hurt. > >>> > >>> 3. They should restore the original object given that the > >>> toConfiguration/fromConfiguration methods have been implemented > >>> correctly. I will extend the example to make the logic clearer while > >>> fixing the bug. > >>> > >>> Thanks for the healthy discussion, > >>> Timo > >>> > >>> > >>> On 30.08.19 15:29, Becket Qin wrote: > >>>> Hi Timo, > >>>> > >>>> Thanks again for the clarification. Please see a few more questions > >>> below. > >>>> Re: 1 > >>>> > >>>>> Please also keep in mind that Configuration must not consist of only > >>>>> strings, it manages a Map<String, Object> for efficient access. Every > >>>>> map entry can have a string representation for persistence, but in > >>>>> most > >>>>> cases consists of unserialized objects. > >>>> I'd like to understand this a bit more. The reason we have a > >>>> Map<String, > >>>> Object> in Configuration was because that Object could be either a > >>> String, > >>>> a List or a Map, right? But they eventually always boil down to > >>>> Strings, > >>> or > >>>> maybe one of the predefined type that we know how to serialize. In the > >>>> current design, can the Object also be Configurable? > >>>> If the value in the config Map<String, Object> can be Configurable > >>> objects, > >>>> how do we serialize them? Calling toConfiguration() seems not quite > >>> working > >>>> because there might be some other internal fields that are not part of > >>> the > >>>> configuration. The modification to those fields will be lost if we > >>>> simply > >>>> use toConfiguration(). So the impact of modifying those Configurable > >>>> objects seems a little undefined. And it would be difficult to prevent > >>>> users from doing that. > >>>> If the value in the config Map<String, Object> cannot be Configurable > >>>> objects, then it seems a little weird to have all the other ConfigType > >>>> stored in the ConfigMap in their own native type and accessed via > >>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be > >>>> different from others because one have to use ConfigurableFactory > >>>> to get > >>>> the configurations. > >>>> > >>>> Re: 2 > >>>> > >>>>> I think about the withExtendedDescription as a helper getter in a > >>>>> different place, so that the option is easier to find in a different > >>>>> module from it was defined. > >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be > >>>>> equal > >>> to: > >>>>> public ConfigOption getMaxParallelismOption() { > >>>>> return CoreOptions.MAX_PARALLELISM; > >>>>> } > >>>> Just to make sure I understand it correctly, does that mean users will > >>> see > >>>> something like following? > >>>> - CoreOptions.MAX_PARALLELISM with description_1; > >>>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. > >>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + > >>>> description_3. > >>>> And users will only configure exactly one MAX_PARALLELISM cross the > >>> board. > >>>> So they won't be confused by setting two MAX_PARALLELISM config for > >>>> two > >>>> different modules, while they are actually the same config. If that is > >>> the > >>>> case, I don't have further concern. > >>>> > >>>> Re: 3 > >>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() > >>>> actually > >>>> restore the original Object. But fromConfiguration() and > >>> toConfiguration() > >>>> does not do that, anything not in the configuration of the original > >>> object > >>>> will be lost. So it would be good to make that clear. Maybe a clear > >>>> Java > >>>> doc is sufficient. > >>>> > >>>> Thanks, > >>>> > >>>> Jiangjie (Becket) Qin > >>>> > >>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz > >>>> <dwysakow...@apache.org > >>>> > >>>> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> Ad. 1 > >>>>> > >>>>> The advantage of our approach is that you have the type definition > >>>>> close > >>>>> to the option definition. The only difference is that it enables > >>>>> expressing simple pojos with the primitives like > >>>>> ConfigOption<Integer>, > >>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having > >>>>> > >>>>> the parsing logic scattered everywhere in the code base as it is now. > >>>>> The string representation in our proposal is exactly the same as you > >>>>> explained for those three options. The only difference is that you > >>>>> don't > >>>>> have to parse the elements of a List, Map etc. afterwards. > >>>>> > >>>>> Ad. 2 > >>>>> > >>>>> I think about the withExtendedDescription as a helper getter in a > >>>>> different place, so that the option is easier to find in a different > >>>>> module from it was defined. > >>>>> > >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be > >>>>> equal > >>> to: > >>>>> public ConfigOption getMaxParallelismOption() { > >>>>> > >>>>> return CoreOptions.MAX_PARALLELISM; > >>>>> > >>>>> } > >>>>> > >>>>> This allows to further clarify the description of the option in the > >>>>> context of a different module and end up in a seperate page in > >>>>> documentation (but with a link to the original one). In the end it is > >>>>> exactly the same option. It has exactly same key, type, parsing > >>>>> logic, > >>>>> it is in the end forwarded to the same place. > >>>>> > >>>>> Ad. 3 > >>>>> > >>>>> Not sure if I understand your concerns here. As Timo said it is in > >>>>> the > >>>>> end sth similar to toBytes/fromBytes, but it puts itself to a > >>>>> Configuration. Also just wanted to make sure we adjusted this part > >>>>> slightly and now the ConfigOption takes ConfigurableFactory. > >>>>> > >>>>> Best, > >>>>> > >>>>> Dawid > >>>>> > >>>>> > >>>>> On 30/08/2019 09:39, Timo Walther wrote: > >>>>>> Hi Becket, > >>>>>> > >>>>>> thanks for the discussion. > >>>>>> > >>>>>> 1. ConfigOptions in their current design are bound to classes. > >>>>>> Regarding, the option is "creating some Configurable objects instead > >>>>>> of defining the config to create > >>>>>> those Configurable"? We just moved this logic to a factory, this > >>>>>> factory can then also be used for other purposes. However, how the > >>>>>> option and objects are serialized to Configuration is still not part > >>>>>> of the option. The option is just pure declaration. > >>>>>> > >>>>>> If we would allow only List<String>, implementers would need to > >>>>>> start > >>>>>> implementing own parsing and validation logic all the time. We would > >>>>>> like to avoid that. > >>>>>> > >>>>>> Please also keep in mind that Configuration must not consist of only > >>>>>> strings, it manages a Map<String, Object> for efficient access. > >>>>>> Every > >>>>>> map entry can have a string representation for persistence, but in > >>>>>> most cases consists of unserialized objects. > >>>>>> > >>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite > >>>>>> keys, types or default values. But different layers might want to > >>>>>> add > >>>>>> helpful information. In our concrete use case for FLIP-59, > >>>>>> ExecutionConfig has 50 properties and many of them are not relevant > >>>>>> for the Table layer or have no effect at all. We would like to list > >>>>>> and mention the most important config options again in the Table > >>>>>> Configuration section, so that users are not confused, but with a > >>>>>> strong link to the core option. E.g.: registered kryo serializers > >>>>>> are > >>>>>> also important also for Table users, we would like to add the > >>>>>> comment > >>>>>> "This option allows to modify the serialization of the ANY SQL data > >>>>>> type.". I think we should not spam the core configuration page with > >>>>>> comments from all layers, connectors, or libraries but keep this in > >>>>>> the corresponding component documentation. > >>>>>> > >>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes > >>>>>> and deserializes T from a configuration? > >>>>>> > >>>>>> Regards, > >>>>>> Timo > >>>>>> > >>>>>> On 29.08.19 19:14, Becket Qin wrote: > >>>>>>> Hi Timo and Stephan, > >>>>>>> > >>>>>>> Thanks for the detail explanation. > >>>>>>> > >>>>>>> 1. I agree that each config should be in a human readable > >>>>>>> format. My > >>>>>>> concern is that the current List<Configurable> looks going a little > >>>>>>> too far > >>>>>>> from what the configuration is supposed to do. They are essentially > >>>>>>> creating some Configurable objects instead of defining the > >>>>>>> config to > >>>>>>> create > >>>>>>> those Configurable. This mixes ConfigOption and the usage of it. > >>>>>>> API > >>>>>>> wise > >>>>>>> it would be good to keep the configs and their usages (such as > >>>>>>> how to > >>>>>>> create objects using the ConfigOption) apart from each other. > >>>>>>> I am wondering if we can just make List also only take string. For > >>>>>>> example, > >>>>>>> is the following definition of map and list sufficient? > >>>>>>> > >>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be > >>>>>>> defined > >>>>>>> as: > >>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... > >>>>>>> > >>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined > >>> as: > >>>>>>> list_config_name: v1, v2, v3, ... > >>>>>>> > >>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, > >>>>>>> String>>. It > >>>>>>> can > >>>>>>> be defined as: > >>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... > >>>>>>> > >>>>>>> All the key and values in the configuration are String. This also > >>>>>>> guarantees that the configuration is always serializable. > >>>>>>> If we want to do one more step, we can allow the ConfigOption to > >>>>>>> set > >>> all > >>>>>>> the primitive types and parse that for the users. So something like > >>>>>>> List<Integer>, List<Class<?>> seems fine. > >>>>>>> > >>>>>>> The configuration class could also have util methods to create a > >>>>>>> list > >>> of > >>>>>>> configurable such as: > >>>>>>> <T> List<T> > >>> <Configuration#getConfigurableInstances(ListMapConfigOption, > >>>>>>> Class<T> clazz). > >>>>>>> But the configuration class will not take arbitrary Configurable as > >>> the > >>>>>>> value of its config. > >>>>>>> > >>>>>>> 2. I might have misunderstood this. But my concern on the > >>>>>>> description > >>>>>>> extension is in the following example. > >>>>>>> > >>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = > >>>>>>> > >>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( > >>>>>>> "Note: That this property means that a table program has a > >>>>>>> side-effect > >>>>>>> XYZ."); > >>>>>>> > >>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is > >>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I > >>>>>>> suppose > >>>>>>> users > >>>>>>> will see both configurations. One with an extended description > >>>>>>> and one > >>>>>>> without. Let's say there is a third component which also users > >>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM > >>>>>>> ConfigOption? If > >>>>>>> so, what would that ConfigOption's description look like? > >>>>>>> > >>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM > >>>>>>> and the > >>>>>>> description should clearly state all the usage of this > >>>>>>> ConfigOption. > >>>>>>> > >>>>>>> 3. I see, in that case, how about we name it something like > >>>>>>> extractConfiguration()? I am just trying to see if we can make it > >>> clear > >>>>>>> this is not something like fromBytes() and toBytes(). > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Jiangjie (Becket) Qin > >>>>>>> > >>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <twal...@apache.org> > >>>>> wrote: > >>>>>>>> Hi Becket, > >>>>>>>> > >>>>>>>> let me try to clarify some of your questions: > >>>>>>>> > >>>>>>>> 1. For every option, we also needed to think about how to > >>>>>>>> represent > >>> it > >>>>>>>> in a human readable format. We do not want to allow arbitrary > >>>>>>>> nesting > >>>>>>>> because that would easily allow to bypass the flattened > >>>>>>>> hierarchy of > >>>>>>>> config options (`session.memory.min`). The current design > >>>>>>>> allows to > >>>>>>>> represent every option type as a list. E.g.: > >>>>>>>> > >>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` > >>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: > >>>>>>>> field=12,other=true; field=12,other=true` > >>>>>>>> `myPropertyOption: key=str0,other=str1` can be > >>>>>>>> `myPropertyListOption: > >>>>>>>> key=str0,other=str1;key=str0,other=str1` > >>>>>>>> > >>>>>>>> We need the atomic class for serialization/deserialization both in > >>>>>>>> binary and string format. > >>>>>>>> > >>>>>>>> ConfigOption<List> is not present in the code base yet, but this > >>>>>>>> FLIP is > >>>>>>>> a preparation of making ExecutionConfig configurable. If you look > >>> into > >>>>>>>> this class or also in existing table connectors/formats, you > >>>>>>>> will see > >>>>>>>> that each proposed option type has its requirements. > >>>>>>>> > >>>>>>>> 2. Regarding extending the description of ConfigOptions, the > >>>>>>>> semantic of > >>>>>>>> one option should be a super set of the other option. E.g. in > >>>>>>>> Table > >>> API > >>>>>>>> we might use general ExecutionConfig properties. But we would like > >>>>>>>> to a) > >>>>>>>> make external options more prominent in the Table API config > >>>>>>>> docs to > >>>>>>>> link people to properties they should pay attention b) notice > >>>>>>>> about > >>>>>>>> side > >>>>>>>> effects. The core semantic of a property should not change. > >>>>>>>> > >>>>>>>> 3. The factory will not receive the entire configuration but works > >>> in a > >>>>>>>> separate key space. For `myObjectOption` above, it would receive a > >>>>>>>> configuration that consists of `field: 12` and `other: true`. > >>>>>>>> > >>>>>>>> I agree. I will convert the document into a Wiki page today. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: > >>>>>>>>> @Becket One thing that may be non-obvious is that the > >>>>>>>>> Configuration > >>>>>>>>> class > >>>>>>>>> also defines serialization / persistence logic at the moment. > >>>>>>>>> So it > >>>>>>>>> needs > >>>>>>>>> to know the set of types it supports. That stands in the way > >>>>>>>>> of an > >>>>>>>>> arbitrary generic map type. > >>>>>>>>> > >>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one > >>>>>>>>> collection orthogonal to the type (List) and another one bound to > >>>>>>>> specific > >>>>>>>>> types (Map). > >>>>>>>>> > >>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <becket....@gmail.com > > > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Timo, > >>>>>>>>>> > >>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I > >>>>>>>>>> have a > >>>>>>>>>> few > >>>>>>>>>> questions / comments. > >>>>>>>>>> > >>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? > >>>>>>>>>> Would it be enough to just check the atomicClass to see if it > >>>>>>>>>> is a > >>>>>>>>>> List > >>>>>>>> or > >>>>>>>>>> not? > >>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume > >>>>>>>>>> both key > >>>>>>>>>> and value types are String? Can we just apply the same to the > >>>>>>>>>> ConfigOption<List>? > >>>>>>>>>> BTW, I did a quick search in the codebase but did not find any > >>>>>>>>>> usage of > >>>>>>>>>> ConfigOption<List>. > >>>>>>>>>> > >>>>>>>>>> 2. The same config name, but with two ConfigOption with > >>>>>>>>>> different > >>>>>>>> semantic > >>>>>>>>>> in different component seems super confusing. For example, when > >>> users > >>>>>>>> set > >>>>>>>>>> both configs, they may have no idea one is overriding the other. > >>>>>>>>>> There > >>>>>>>>>> might be two cases: > >>>>>>>>>> - If it is just the same config used by different > >>>>>>>>>> components to > >>>>>>>>>> act > >>>>>>>>>> accordingly, it might be better to just have one config, but > >>> describe > >>>>>>>>>> clearly on how that config will be used. > >>>>>>>>>> - If it is actually two configurations that can be set > >>>>>>>>>> differently, I > >>>>>>>>>> think the config names should just be different. > >>>>>>>>>> > >>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() > >>> method > >>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() > >>>>>>>>>> method > >>>>>>>> sounds > >>>>>>>>>> like converting an object to a configuration, which only > >>>>>>>>>> works if > >>> the > >>>>>>>>>> object does not contain any state / value. I am also > >>>>>>>>>> wondering if > >>>>>>>>>> there > >>>>>>>> is > >>>>>>>>>> a real use case of this method. Because supposedly the > >>> configurations > >>>>>>>> could > >>>>>>>>>> just be passed around to caller of this method. > >>>>>>>>>> > >>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of > >>>>>>>>>> in the > >>>>>>>> Google > >>>>>>>>>> doc before voting? The FLIP wiki allows track the modification > >>>>>>>>>> history > >>>>>>>> and > >>>>>>>>>> has a more established structure to ensure nothing is missed. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> > >>>>>>>>>> Jiangjie (Becket) Qin > >>>>>>>>>> > >>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther > >>>>>>>>>> <twal...@apache.org> > >>>>>>>> wrote: > >>>>>>>>>>> Hi everyone, > >>>>>>>>>>> > >>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the > >>> voting > >>>>>>>>>>> thread. If there are no objections, I will start a new voting > >>> thread > >>>>>>>>>>> tomorrow at 9am Berlin time. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: > >>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>> > >>>>>>>>>>>> thanks for all the feedback we have received online and > >>>>>>>>>>>> offline. > >>> It > >>>>>>>>>>>> showed that many people support the idea of evolving the Flink > >>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP > >>>>>>>>>>>> will > >>>>>>>>>>>> not > >>>>>>>>>>>> solve all issues but at least will improve the current status. > >>>>>>>>>>>> > >>>>>>>>>>>> We've updated the document and replaced the Correlation part > >>>>>>>>>>>> with the > >>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available > >>>>>>>>>>>> options > >>>>>>>>>>>> of a group plus custom group validators for eager validation. > >>>>>>>>>>>> For now, > >>>>>>>>>>>> this eager group validation will only be used at certain > >>>>>>>>>>>> locations in > >>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire > >>>>>>>>>>>> global > >>>>>>>>>>>> configuration before submitting a job in the future. > >>>>>>>>>>>> > >>>>>>>>>>>> Please take another look if you find time. I hope we can > >>>>>>>>>>>> proceed > >>>>>>>>>>>> with > >>>>>>>>>>>> the voting process if there are no objections. > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> Timo > >>>>>>>>>>>> > >>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: > >>>>>>>>>>>>> Hi Stephan, > >>>>>>>>>>>>> > >>>>>>>>>>>>> thanks for your suggestions. Let me give you some background > >>> about > >>>>>>>>>>>>> the decisions made in this FLIP: > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" > >>>>>>>>>>>>> because we > >>> did > >>>>>>>>>>>>> not want to change the entire configuration infrastructure. > >>>>>>>>>>>>> Both for > >>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that > >>>>>>>>>>>>> would be > >>>>>>>>>>>>> required to update all options. If our goal is to rework the > >>>>>>>>>>>>> configuration option entirely, I might suggest to switch > >>>>>>>>>>>>> to JSON > >>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting > >>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the > >>> more > >>>>>>>>>>>>> nested structures are allowed. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is > >>>>>>>>>>>>> centered > >>>>>>>>>>>>> around Java classes where T is determined by the default > >>>>>>>>>>>>> value. > >>>>>>>>>>>>> The > >>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit > >>>>>>>>>>>>> `intType()` method etc. The current design of validators > >>> centered > >>>>>>>>>>>>> around Java classes makes it possible to have typical domain > >>>>>>>>>>>>> validators baked by generics as you suggested. If we > >>>>>>>>>>>>> introduce > >>>>>>>>>>>>> types > >>>>>>>>>>>>> such as "quantity with measure and unit" we still need to > >>>>>>>>>>>>> get a > >>>>>>>>>>>>> class > >>>>>>>>>>>>> out of this option at the end, so why changing a proven > >>>>>>>>>>>>> concept? > >>>>>>>>>>>>> > >>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary > >>>>>>>>>>>>> nesting. As > >>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every > >>>>>>>>>>>>> atomic > >>>>>>>>>>>>> option like "key=12" can be represented by a list > >>>>>>>>>>>>> "keys=12;13". > >>>>>>>>>>>>> But > >>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated > >>>>>>>>>>>>> list > >>>>>>>>>>>>> option > >>>>>>>>>>>>> would start making this more complicated such as > >>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), > >>>>>>>>>>>>> StringOption(...)))", do we want that? > >>>>>>>>>>>>> > >>>>>>>>>>>>> 4. Correlation: The correlation part is one of the > >>>>>>>>>>>>> suggestions > >>>>>>>>>>>>> that I > >>>>>>>>>>>>> like least in the document. We can also discuss removing it > >>>>>>>>>>>>> entirely, > >>>>>>>>>>>>> but I think it solves the use case of relating options > >>>>>>>>>>>>> with each > >>>>>>>>>>>>> other in a flexible way right next to the actual option. > >>>>>>>>>>>>> Instead of > >>>>>>>>>>>>> being hidden in some component initialization, we should > >>>>>>>>>>>>> put it > >>>>>>>>>>>>> close > >>>>>>>>>>>>> to the option to also perform validation eagerly instead of > >>>>>>>>>>>>> failing > >>>>>>>>>>>>> at runtime when the option is accessed the first time. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> Timo > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: > >>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The > >>>>>>>>>>>>>> idea is > >>>>>>>>>>>>>> to see > >>>>>>>>>>>>>> if something like that can ease validation. Especially the > >>>>>>>>>> correlation > >>>>>>>>>>>>>> system seems quite complex (proxies to work around order of > >>>>>>>>>>>>>> initialization). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> For example, let's assume we don't think primarily about > >>>>>>>>>>>>>> "java > >>>>>>>>>>>>>> types" but > >>>>>>>>>>>>>> would define types as one of the following (just examples, > >>>>>>>>>>>>>> haven't > >>>>>>>>>>>>>> thought > >>>>>>>>>>>>>> all the details through): > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (a) category type: implies string, and a fix set of > >>> possible > >>>>>>>>>> values. > >>>>>>>>>>>>>> Those would be passes and naturally make it into the docs > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>> validation. > >>>>>>>>>>>>>> Maps to a String or Enum in Java. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally > >>>>>>>>>>>>>> integer, > >>>>>>>> if > >>>>>>>>>>>>>> we want > >>>>>>>>>>>>>> to automatically check overflow / underflow). would take > >>> typical > >>>>>>>>>> domain > >>>>>>>>>>>>>> validators, like non-negative, etc. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (c) numeric real type: same as above (double or > >>>>>>>>>>>>>> float) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (d) numeric interval type: either defined as an > >>> interval, or > >>>>>>>>>>>>>> references > >>>>>>>>>>>>>> other parameter by key. validation by valid interval. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately > >>>>>>>>>>>>>> parsable. > >>> The > >>>>>>>>>>>>>> measure's > >>>>>>>>>>>>>> type could be any of the numeric types above, with same > >>>>>>>>>>>>>> validation > >>>>>>>>>>>>>> rules. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> With a system like the above, would we still correlation > >>>>>>>>>>>>>> validators? > >>>>>>>>>>>>>> Are > >>>>>>>>>>>>>> there still cases that we need to catch early (config > >>>>>>>>>>>>>> loading) > >>> or > >>>>>>>>>>>>>> are the > >>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup > >>> specific, > >>>>>>>>>>>>>> that it is > >>>>>>>>>>>>>> fine to handle them in component initialization? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz > >>>>>>>>>>>>>> <dwysakow...@apache.org> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Stephan, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thank you for your opinion. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the > >>>>>>>>>>>>>>> most of > >>>>>>>>>> the > >>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown > >>> type > >>>>>>>>>>>>>>> system, > >>>>>>>>>>>>>>> a field like isList may look weird. Please let me > >>>>>>>>>>>>>>> elaborate a > >>>>>>>>>>>>>>> bit > >>>>>>>>>> more > >>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear > >>>>>>>>>>>>>>> enough > >>>>>>>>>>>>>>> about > >>>>>>>>>> it > >>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is > >>>>>>>>>>>>>>> that they > >>>>>>>>>>>>>>> must > >>>>>>>>>>>>>>> have a string representation as they might come from a > >>>>>>>> configuration > >>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so > >>>>>>>>>>>>>>> that the > >>>>>>>>>> values > >>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not > >>>>>>>>>>>>>>> want to > >>>>>>>>>>>>>>> add a > >>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for > >>>>>>>>>>>>>>> lists > >>>>>>>>>> only > >>>>>>>>>>>>>>> (and flat objects - I think though in the current design > >>>>>>>>>>>>>>> there is a > >>>>>>>>>>>>>>> mistake around the Configurable interface). I think though > >>>>>>>>>>>>>>> you have > >>>>>>>>>> a > >>>>>>>>>>>>>>> point here and it would be better to have a > >>>>>>>>>>>>>>> ListConfigOption > >>>>>>>>>>>>>>> instead of > >>>>>>>>>>>>>>> this field. Does it make sense to you? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I > >>>>>>>>>> understood > >>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from > >>>>>>>>>>>>>>> Configuration that means they can be bound to the generic > >>>>>>>>>>>>>>> parameter > >>>>>>>>>> of > >>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends > >>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the > >>>>>>>>>>>>>>> ConfigOption > >>>>>>>>>>>>>>> has anything to do with the validation logic. Could you > >>>>>>>>>>>>>>> elaborate a > >>>>>>>>>>>>>>> bit > >>>>>>>>>>>>>>> more what did you mean? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Dawid > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: > >>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do > >>> early > >>>>>>>>>>>>>>> validation. > >>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad > >>>>>>>>>>>>>>>> hoc, > >>>>>>>>>>>>>>>> though. For > >>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear > >>>>>>>>>>>>>>>> indication of > >>> not > >>>>>>>>>>>>>>>> having > >>>>>>>>>>>>>>>> thought through the type/category system. > >>>>>>>>>>>>>>>> Also, having a more clear category system makes validation > >>>>>>>> simpler. > >>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between > >>> numeric > >>>>>>>>>>>>>>> parameters > >>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible > >>>>>>>>>>>>>>>> values), > >>>>>>>>>>>>>>>> quantities > >>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), > >>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>> results in > >>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>> elegant system for validation. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < > >>>>>>>>>> lzljs3620...@aliyun.com > >>>>>>>>>>>>>>> .invalid> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. > >>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of > >>>>>>>>>>>>>>>>> various > >>>>>>>>>>>>>>>>> modules to be unified. > >>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new > >>>>>>>>>>>>>>>>> unified > >>>>>>>>>>>>>>>>> TableEnvironment available for production. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, > >>>>>>>>>>>>>>>>> such as > >>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The > >>>>>>>>>>>>>>>>> skew may > >>>>>>>>>>>>>>>>> be a single field or a combination of multiple > >>>>>>>>>>>>>>>>> fields. > >>>>>>>>>>>>>>>>> So the > >>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON > >>>>>>>>>>>>>>>>> string > >>> to > >>>>>>>>>>>>>>>>> configure it. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>>>>>>>>>>> From:Jark Wu <imj...@gmail.com> > >>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 > >>>>>>>>>>>>>>>>> To:dev <dev@flink.apache.org> > >>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and > >>>>>>>>>> Configuration > >>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a > >>> long > >>>>>>>>>> time. > >>>>>>>>>>>>>>>>> We have seen the benefit when developing blink > >>>>>>>>>>>>>>>>> configurations and > >>>>>>>>>>>>>>> connector > >>>>>>>>>>>>>>>>> properties in 1.9 release. > >>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed > >>>>>>>>>>>>>>>>> design. > >>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < > >>> wander4...@gmail.com > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi Timo, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which > >>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of > >>>>>>>>>>>>>>>>>> Flink. > >>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment > >>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by > >>>>>>>>>> configuration. > >>>>>>>>>>>>>>>>>> Will take a look at the document in days. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> tison. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年8月16日周五 > >>>>>>>>>>>>>>>>>> 下午10:12写道: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of > >>>>>>>>>>>>>>>>>>> ExecutionConfig and > >>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is > >>>>>>>>>>>>>>>>>>> necessary > >>>>>>>>>>>>>>>>>>> to make > >>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, > >>>>>>>>>>>>>>>>>>> with > >>> the > >>>>>>>>>>>>>>>>>>> new SQL > >>>>>>>>>>>>>>>>> DDL > >>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and > >>>>>>>>>>>>>>>>>>> formats > >>>>>>>>>>>>>>>>>>> coming up, > >>>>>>>>>>>>>>>>>>> unified configuration becomes more important. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> We need more features around string-based configuration > >>>>>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>>>>> future, > >>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose > >>>>>>>>>>>>>>>>>>> FLIP-54 for > >>>>>>>>>>>>>>>>>>> evolving > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>> > https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit > >>> > >>>>>>>>>>>>>>>>>>> In summary it adds: > >>>>>>>>>>>>>>>>>>> - documented types and validation > >>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list > >>>>>>>>>>>>>>>>>>> - simple non-nested object types > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Looking forward to your feedback, > >>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>> > > > >