Hi Timo, Becket From the options that Timo suggested for improving the mutability situation I would prefer option a) as this is the more explicit option and simpler option. Just as a remark, I think in general Configurable types for options will be rather very rare for some special use-cases, as the majority of options are rather simpler parameters of primitive types (+duration, memory)
@Becket for the toConfiguration this is required for shipping the Configuration to TaskManager, so that we do not have to use java serializability. Best, Dawid On 02/09/2019 10:05, Timo Walther wrote: > Hi Becket, > > Re 1 & 3: "values in configurations should actually be immutable" > > I would also prefer immutability but most of our configuration is > mutable due to serialization/deserialization. Also maps and list could > be mutable in theory. It is difficult to really enforce that for > nested structures. I see two options: > > a) For the original design: How about we force implementers to add a > duplicate() method in a Configurable object? This would mean that the > object is still mutable but by duplicating the object both during > reading and writing we would avoid the problem you described. > > b) For the current design: We still use the factory approach but let a > Configurable object implement a getFactory() method such that we know > how to serialize the object. With the help of a factory we can also > duplicate the object easily during reading and writing and ensure > immutability. > > I would personally go for approach a) to not over-engineer this topic. > But I'm open for option b). > > Regards, > Timo > > > On 31.08.19 04:09, Becket Qin wrote: >> Hi Timo, >> >> Thanks for the reply. I am still a little concerned over the >> mutability of >> the Configurable which could be the value in Configuration. >> >> Re: 1 >> >>> But in general, people should not use any internal fields. >>> Configurable objects are meant for simple little helper POJOs, not >>> complex arbitrary nested data structures. >> This seems difficult to enforce... Ideally the values in configurations >> should actually be immutable. The value can only be changed by >> explicitly >> calling setters in Configuration. Otherwise we may have weird situation >> where the Configurable in the same configuration are different in two >> places because the configurable is modified in one places and not >> modified >> in another place. So I am a little concerned on putting a >> Configurable type >> in the Configuration map, because the value could be modified without >> explicitly setting the configuration. For example, can users do the >> following? >> >> Configurable configurable = >> configuration.getConfigurable(myConfigurableOption); >> configurable.setConfigA(123); // this already changes the configurable >> object in the configuration. >> >> Re: 2 >> Thanks for confirming. As long as users will not have a situation where >> they need to set two configurations with the same key but different >> descriptions, I think it is OK. >> >> Re: 3 >> This is actually kind of related to 1. i.e. Whether toConfiguration() >> guarantees the exact same object can be rebuilt from the >> configuration or >> not. I am still not quite sure about the use case of toConfiguration() >> though. It seems indicating the Configurable is mutable, which might be >> dangerous. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <twal...@apache.org> >> wrote: >> >>> Hi Becket, >>> >>> 1. First of all, you are totally right. The FLIP contains a bug due to >>> the last minute changes that Dawid suggested: by having immutable >>> objects created by a factory we loose the serializability of the >>> Configuration because the factory itself is not stored in the >>> Configuration. I would propose to revert the last change and stick to >>> the original design, which means that a object must implement the >>> Configurable interface and also implements >>> serialization/deserialization >>> methods such that also internal fields can be persisted as you >>> suggested. But in general, people should not use any internal fields. >>> Configurable objects are meant for simple little helper POJOs, not >>> complex arbitrary nested data structures. >>> >>> It is Map<String, Object> because Configuration stores the raw objects. >>> If you put a Boolean option into it, it remains Boolean. This makes the >>> map very efficient for shipping to the cluster and accessing options >>> multiple times. The same for configurable objects. We put the pure >>> objects into the map without any serialization/deserialization. The >>> provided factory allows to convert the Object into a Configuration and >>> we know how to serialize/deserializise a configuration because it is >>> just a key/value map. >>> >>> 2. Yes, this is what we had in mind. It should still be the same >>> configuration option. We would like to avoid specialized option keys >>> across components (exec.max-para and table.exec.max-para) if they are >>> describing basically the same thing. But adding some more description >>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" >>> does not hurt. >>> >>> 3. They should restore the original object given that the >>> toConfiguration/fromConfiguration methods have been implemented >>> correctly. I will extend the example to make the logic clearer while >>> fixing the bug. >>> >>> Thanks for the healthy discussion, >>> Timo >>> >>> >>> On 30.08.19 15:29, Becket Qin wrote: >>>> Hi Timo, >>>> >>>> Thanks again for the clarification. Please see a few more questions >>> below. >>>> Re: 1 >>>> >>>>> Please also keep in mind that Configuration must not consist of only >>>>> strings, it manages a Map<String, Object> for efficient access. Every >>>>> map entry can have a string representation for persistence, but in >>>>> most >>>>> cases consists of unserialized objects. >>>> I'd like to understand this a bit more. The reason we have a >>>> Map<String, >>>> Object> in Configuration was because that Object could be either a >>> String, >>>> a List or a Map, right? But they eventually always boil down to >>>> Strings, >>> or >>>> maybe one of the predefined type that we know how to serialize. In the >>>> current design, can the Object also be Configurable? >>>> If the value in the config Map<String, Object> can be Configurable >>> objects, >>>> how do we serialize them? Calling toConfiguration() seems not quite >>> working >>>> because there might be some other internal fields that are not part of >>> the >>>> configuration. The modification to those fields will be lost if we >>>> simply >>>> use toConfiguration(). So the impact of modifying those Configurable >>>> objects seems a little undefined. And it would be difficult to prevent >>>> users from doing that. >>>> If the value in the config Map<String, Object> cannot be Configurable >>>> objects, then it seems a little weird to have all the other ConfigType >>>> stored in the ConfigMap in their own native type and accessed via >>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be >>>> different from others because one have to use ConfigurableFactory >>>> to get >>>> the configurations. >>>> >>>> Re: 2 >>>> >>>>> I think about the withExtendedDescription as a helper getter in a >>>>> different place, so that the option is easier to find in a different >>>>> module from it was defined. >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>> equal >>> to: >>>>> public ConfigOption getMaxParallelismOption() { >>>>> return CoreOptions.MAX_PARALLELISM; >>>>> } >>>> Just to make sure I understand it correctly, does that mean users will >>> see >>>> something like following? >>>> - CoreOptions.MAX_PARALLELISM with description_1; >>>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. >>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + >>>> description_3. >>>> And users will only configure exactly one MAX_PARALLELISM cross the >>> board. >>>> So they won't be confused by setting two MAX_PARALLELISM config for >>>> two >>>> different modules, while they are actually the same config. If that is >>> the >>>> case, I don't have further concern. >>>> >>>> Re: 3 >>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() >>>> actually >>>> restore the original Object. But fromConfiguration() and >>> toConfiguration() >>>> does not do that, anything not in the configuration of the original >>> object >>>> will be lost. So it would be good to make that clear. Maybe a clear >>>> Java >>>> doc is sufficient. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz >>>> <dwysakow...@apache.org >>>> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Ad. 1 >>>>> >>>>> The advantage of our approach is that you have the type definition >>>>> close >>>>> to the option definition. The only difference is that it enables >>>>> expressing simple pojos with the primitives like >>>>> ConfigOption<Integer>, >>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having >>>>> >>>>> the parsing logic scattered everywhere in the code base as it is now. >>>>> The string representation in our proposal is exactly the same as you >>>>> explained for those three options. The only difference is that you >>>>> don't >>>>> have to parse the elements of a List, Map etc. afterwards. >>>>> >>>>> Ad. 2 >>>>> >>>>> I think about the withExtendedDescription as a helper getter in a >>>>> different place, so that the option is easier to find in a different >>>>> module from it was defined. >>>>> >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>> equal >>> to: >>>>> public ConfigOption getMaxParallelismOption() { >>>>> >>>>> return CoreOptions.MAX_PARALLELISM; >>>>> >>>>> } >>>>> >>>>> This allows to further clarify the description of the option in the >>>>> context of a different module and end up in a seperate page in >>>>> documentation (but with a link to the original one). In the end it is >>>>> exactly the same option. It has exactly same key, type, parsing >>>>> logic, >>>>> it is in the end forwarded to the same place. >>>>> >>>>> Ad. 3 >>>>> >>>>> Not sure if I understand your concerns here. As Timo said it is in >>>>> the >>>>> end sth similar to toBytes/fromBytes, but it puts itself to a >>>>> Configuration. Also just wanted to make sure we adjusted this part >>>>> slightly and now the ConfigOption takes ConfigurableFactory. >>>>> >>>>> Best, >>>>> >>>>> Dawid >>>>> >>>>> >>>>> On 30/08/2019 09:39, Timo Walther wrote: >>>>>> Hi Becket, >>>>>> >>>>>> thanks for the discussion. >>>>>> >>>>>> 1. ConfigOptions in their current design are bound to classes. >>>>>> Regarding, the option is "creating some Configurable objects instead >>>>>> of defining the config to create >>>>>> those Configurable"? We just moved this logic to a factory, this >>>>>> factory can then also be used for other purposes. However, how the >>>>>> option and objects are serialized to Configuration is still not part >>>>>> of the option. The option is just pure declaration. >>>>>> >>>>>> If we would allow only List<String>, implementers would need to >>>>>> start >>>>>> implementing own parsing and validation logic all the time. We would >>>>>> like to avoid that. >>>>>> >>>>>> Please also keep in mind that Configuration must not consist of only >>>>>> strings, it manages a Map<String, Object> for efficient access. >>>>>> Every >>>>>> map entry can have a string representation for persistence, but in >>>>>> most cases consists of unserialized objects. >>>>>> >>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>>>>> keys, types or default values. But different layers might want to >>>>>> add >>>>>> helpful information. In our concrete use case for FLIP-59, >>>>>> ExecutionConfig has 50 properties and many of them are not relevant >>>>>> for the Table layer or have no effect at all. We would like to list >>>>>> and mention the most important config options again in the Table >>>>>> Configuration section, so that users are not confused, but with a >>>>>> strong link to the core option. E.g.: registered kryo serializers >>>>>> are >>>>>> also important also for Table users, we would like to add the >>>>>> comment >>>>>> "This option allows to modify the serialization of the ANY SQL data >>>>>> type.". I think we should not spam the core configuration page with >>>>>> comments from all layers, connectors, or libraries but keep this in >>>>>> the corresponding component documentation. >>>>>> >>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes >>>>>> and deserializes T from a configuration? >>>>>> >>>>>> Regards, >>>>>> Timo >>>>>> >>>>>> On 29.08.19 19:14, Becket Qin wrote: >>>>>>> Hi Timo and Stephan, >>>>>>> >>>>>>> Thanks for the detail explanation. >>>>>>> >>>>>>> 1. I agree that each config should be in a human readable >>>>>>> format. My >>>>>>> concern is that the current List<Configurable> looks going a little >>>>>>> too far >>>>>>> from what the configuration is supposed to do. They are essentially >>>>>>> creating some Configurable objects instead of defining the >>>>>>> config to >>>>>>> create >>>>>>> those Configurable. This mixes ConfigOption and the usage of it. >>>>>>> API >>>>>>> wise >>>>>>> it would be good to keep the configs and their usages (such as >>>>>>> how to >>>>>>> create objects using the ConfigOption) apart from each other. >>>>>>> I am wondering if we can just make List also only take string. For >>>>>>> example, >>>>>>> is the following definition of map and list sufficient? >>>>>>> >>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >>>>>>> defined >>>>>>> as: >>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>>>>> >>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined >>> as: >>>>>>> list_config_name: v1, v2, v3, ... >>>>>>> >>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, >>>>>>> String>>. It >>>>>>> can >>>>>>> be defined as: >>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>>>>> >>>>>>> All the key and values in the configuration are String. This also >>>>>>> guarantees that the configuration is always serializable. >>>>>>> If we want to do one more step, we can allow the ConfigOption to >>>>>>> set >>> all >>>>>>> the primitive types and parse that for the users. So something like >>>>>>> List<Integer>, List<Class<?>> seems fine. >>>>>>> >>>>>>> The configuration class could also have util methods to create a >>>>>>> list >>> of >>>>>>> configurable such as: >>>>>>> <T> List<T> >>> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>>>>> Class<T> clazz). >>>>>>> But the configuration class will not take arbitrary Configurable as >>> the >>>>>>> value of its config. >>>>>>> >>>>>>> 2. I might have misunderstood this. But my concern on the >>>>>>> description >>>>>>> extension is in the following example. >>>>>>> >>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>>>>> >>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>>>>> "Note: That this property means that a table program has a >>>>>>> side-effect >>>>>>> XYZ."); >>>>>>> >>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is >>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I >>>>>>> suppose >>>>>>> users >>>>>>> will see both configurations. One with an extended description >>>>>>> and one >>>>>>> without. Let's say there is a third component which also users >>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>>>>> ConfigOption? If >>>>>>> so, what would that ConfigOption's description look like? >>>>>>> >>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >>>>>>> and the >>>>>>> description should clearly state all the usage of this >>>>>>> ConfigOption. >>>>>>> >>>>>>> 3. I see, in that case, how about we name it something like >>>>>>> extractConfiguration()? I am just trying to see if we can make it >>> clear >>>>>>> this is not something like fromBytes() and toBytes(). >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jiangjie (Becket) Qin >>>>>>> >>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <twal...@apache.org> >>>>> wrote: >>>>>>>> Hi Becket, >>>>>>>> >>>>>>>> let me try to clarify some of your questions: >>>>>>>> >>>>>>>> 1. For every option, we also needed to think about how to >>>>>>>> represent >>> it >>>>>>>> in a human readable format. We do not want to allow arbitrary >>>>>>>> nesting >>>>>>>> because that would easily allow to bypass the flattened >>>>>>>> hierarchy of >>>>>>>> config options (`session.memory.min`). The current design >>>>>>>> allows to >>>>>>>> represent every option type as a list. E.g.: >>>>>>>> >>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>>>>>>> field=12,other=true; field=12,other=true` >>>>>>>> `myPropertyOption: key=str0,other=str1` can be >>>>>>>> `myPropertyListOption: >>>>>>>> key=str0,other=str1;key=str0,other=str1` >>>>>>>> >>>>>>>> We need the atomic class for serialization/deserialization both in >>>>>>>> binary and string format. >>>>>>>> >>>>>>>> ConfigOption<List> is not present in the code base yet, but this >>>>>>>> FLIP is >>>>>>>> a preparation of making ExecutionConfig configurable. If you look >>> into >>>>>>>> this class or also in existing table connectors/formats, you >>>>>>>> will see >>>>>>>> that each proposed option type has its requirements. >>>>>>>> >>>>>>>> 2. Regarding extending the description of ConfigOptions, the >>>>>>>> semantic of >>>>>>>> one option should be a super set of the other option. E.g. in >>>>>>>> Table >>> API >>>>>>>> we might use general ExecutionConfig properties. But we would like >>>>>>>> to a) >>>>>>>> make external options more prominent in the Table API config >>>>>>>> docs to >>>>>>>> link people to properties they should pay attention b) notice >>>>>>>> about >>>>>>>> side >>>>>>>> effects. The core semantic of a property should not change. >>>>>>>> >>>>>>>> 3. The factory will not receive the entire configuration but works >>> in a >>>>>>>> separate key space. For `myObjectOption` above, it would receive a >>>>>>>> configuration that consists of `field: 12` and `other: true`. >>>>>>>> >>>>>>>> I agree. I will convert the document into a Wiki page today. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Timo >>>>>>>> >>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>>>>> @Becket One thing that may be non-obvious is that the >>>>>>>>> Configuration >>>>>>>>> class >>>>>>>>> also defines serialization / persistence logic at the moment. >>>>>>>>> So it >>>>>>>>> needs >>>>>>>>> to know the set of types it supports. That stands in the way >>>>>>>>> of an >>>>>>>>> arbitrary generic map type. >>>>>>>>> >>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one >>>>>>>>> collection orthogonal to the type (List) and another one bound to >>>>>>>> specific >>>>>>>>> types (Map). >>>>>>>>> >>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <becket....@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Timo, >>>>>>>>>> >>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I >>>>>>>>>> have a >>>>>>>>>> few >>>>>>>>>> questions / comments. >>>>>>>>>> >>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>>>>> Would it be enough to just check the atomicClass to see if it >>>>>>>>>> is a >>>>>>>>>> List >>>>>>>> or >>>>>>>>>> not? >>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>>>>>>> both key >>>>>>>>>> and value types are String? Can we just apply the same to the >>>>>>>>>> ConfigOption<List>? >>>>>>>>>> BTW, I did a quick search in the codebase but did not find any >>>>>>>>>> usage of >>>>>>>>>> ConfigOption<List>. >>>>>>>>>> >>>>>>>>>> 2. The same config name, but with two ConfigOption with >>>>>>>>>> different >>>>>>>> semantic >>>>>>>>>> in different component seems super confusing. For example, when >>> users >>>>>>>> set >>>>>>>>>> both configs, they may have no idea one is overriding the other. >>>>>>>>>> There >>>>>>>>>> might be two cases: >>>>>>>>>> - If it is just the same config used by different >>>>>>>>>> components to >>>>>>>>>> act >>>>>>>>>> accordingly, it might be better to just have one config, but >>> describe >>>>>>>>>> clearly on how that config will be used. >>>>>>>>>> - If it is actually two configurations that can be set >>>>>>>>>> differently, I >>>>>>>>>> think the config names should just be different. >>>>>>>>>> >>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() >>> method >>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() >>>>>>>>>> method >>>>>>>> sounds >>>>>>>>>> like converting an object to a configuration, which only >>>>>>>>>> works if >>> the >>>>>>>>>> object does not contain any state / value. I am also >>>>>>>>>> wondering if >>>>>>>>>> there >>>>>>>> is >>>>>>>>>> a real use case of this method. Because supposedly the >>> configurations >>>>>>>> could >>>>>>>>>> just be passed around to caller of this method. >>>>>>>>>> >>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of >>>>>>>>>> in the >>>>>>>> Google >>>>>>>>>> doc before voting? The FLIP wiki allows track the modification >>>>>>>>>> history >>>>>>>> and >>>>>>>>>> has a more established structure to ensure nothing is missed. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>> >>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther >>>>>>>>>> <twal...@apache.org> >>>>>>>> wrote: >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the >>> voting >>>>>>>>>>> thread. If there are no objections, I will start a new voting >>> thread >>>>>>>>>>> tomorrow at 9am Berlin time. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>>>>> Hi everyone, >>>>>>>>>>>> >>>>>>>>>>>> thanks for all the feedback we have received online and >>>>>>>>>>>> offline. >>> It >>>>>>>>>>>> showed that many people support the idea of evolving the Flink >>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP >>>>>>>>>>>> will >>>>>>>>>>>> not >>>>>>>>>>>> solve all issues but at least will improve the current status. >>>>>>>>>>>> >>>>>>>>>>>> We've updated the document and replaced the Correlation part >>>>>>>>>>>> with the >>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>>>>>>> options >>>>>>>>>>>> of a group plus custom group validators for eager validation. >>>>>>>>>>>> For now, >>>>>>>>>>>> this eager group validation will only be used at certain >>>>>>>>>>>> locations in >>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>>>>>>> global >>>>>>>>>>>> configuration before submitting a job in the future. >>>>>>>>>>>> >>>>>>>>>>>> Please take another look if you find time. I hope we can >>>>>>>>>>>> proceed >>>>>>>>>>>> with >>>>>>>>>>>> the voting process if there are no objections. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks for your suggestions. Let me give you some background >>> about >>>>>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" >>>>>>>>>>>>> because we >>> did >>>>>>>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>>>>>>> Both for >>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>>>>>>> would be >>>>>>>>>>>>> required to update all options. If our goal is to rework the >>>>>>>>>>>>> configuration option entirely, I might suggest to switch >>>>>>>>>>>>> to JSON >>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the >>> more >>>>>>>>>>>>> nested structures are allowed. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>>>>>>> centered >>>>>>>>>>>>> around Java classes where T is determined by the default >>>>>>>>>>>>> value. >>>>>>>>>>>>> The >>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>>>>> `intType()` method etc. The current design of validators >>> centered >>>>>>>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>>>>>>> validators baked by generics as you suggested. If we >>>>>>>>>>>>> introduce >>>>>>>>>>>>> types >>>>>>>>>>>>> such as "quantity with measure and unit" we still need to >>>>>>>>>>>>> get a >>>>>>>>>>>>> class >>>>>>>>>>>>> out of this option at the end, so why changing a proven >>>>>>>>>>>>> concept? >>>>>>>>>>>>> >>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>>>>> nesting. As >>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>>>>>>> atomic >>>>>>>>>>>>> option like "key=12" can be represented by a list >>>>>>>>>>>>> "keys=12;13". >>>>>>>>>>>>> But >>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated >>>>>>>>>>>>> list >>>>>>>>>>>>> option >>>>>>>>>>>>> would start making this more complicated such as >>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>>>>> >>>>>>>>>>>>> 4. Correlation: The correlation part is one of the >>>>>>>>>>>>> suggestions >>>>>>>>>>>>> that I >>>>>>>>>>>>> like least in the document. We can also discuss removing it >>>>>>>>>>>>> entirely, >>>>>>>>>>>>> but I think it solves the use case of relating options >>>>>>>>>>>>> with each >>>>>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>>>>> Instead of >>>>>>>>>>>>> being hidden in some component initialization, we should >>>>>>>>>>>>> put it >>>>>>>>>>>>> close >>>>>>>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>>>>>>> failing >>>>>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>>>>>>> idea is >>>>>>>>>>>>>> to see >>>>>>>>>>>>>> if something like that can ease validation. Especially the >>>>>>>>>> correlation >>>>>>>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>>>>>>> initialization). >>>>>>>>>>>>>> >>>>>>>>>>>>>> For example, let's assume we don't think primarily about >>>>>>>>>>>>>> "java >>>>>>>>>>>>>> types" but >>>>>>>>>>>>>> would define types as one of the following (just examples, >>>>>>>>>>>>>> haven't >>>>>>>>>>>>>> thought >>>>>>>>>>>>>> all the details through): >>>>>>>>>>>>>> >>>>>>>>>>>>>> (a) category type: implies string, and a fix set of >>> possible >>>>>>>>>> values. >>>>>>>>>>>>>> Those would be passes and naturally make it into the docs >>>>>>>>>>>>>> and >>>>>>>>>>>>>> validation. >>>>>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>>>>>>> integer, >>>>>>>> if >>>>>>>>>>>>>> we want >>>>>>>>>>>>>> to automatically check overflow / underflow). would take >>> typical >>>>>>>>>> domain >>>>>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (c) numeric real type: same as above (double or >>>>>>>>>>>>>> float) >>>>>>>>>>>>>> >>>>>>>>>>>>>> (d) numeric interval type: either defined as an >>> interval, or >>>>>>>>>>>>>> references >>>>>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>>>>> >>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately >>>>>>>>>>>>>> parsable. >>> The >>>>>>>>>>>>>> measure's >>>>>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>>>>> validation >>>>>>>>>>>>>> rules. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>>>>> validators? >>>>>>>>>>>>>> Are >>>>>>>>>>>>>> there still cases that we need to catch early (config >>>>>>>>>>>>>> loading) >>> or >>>>>>>>>>>>>> are the >>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup >>> specific, >>>>>>>>>>>>>> that it is >>>>>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>>>>> <dwysakow...@apache.org> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>>>>>>> most of >>>>>>>>>> the >>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown >>> type >>>>>>>>>>>>>>> system, >>>>>>>>>>>>>>> a field like isList may look weird. Please let me >>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>> bit >>>>>>>>>> more >>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear >>>>>>>>>>>>>>> enough >>>>>>>>>>>>>>> about >>>>>>>>>> it >>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>>>>>>> that they >>>>>>>>>>>>>>> must >>>>>>>>>>>>>>> have a string representation as they might come from a >>>>>>>> configuration >>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so >>>>>>>>>>>>>>> that the >>>>>>>>>> values >>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>> add a >>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>>>>>>> lists >>>>>>>>>> only >>>>>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>>>>> there is a >>>>>>>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>>>>>>> you have >>>>>>>>>> a >>>>>>>>>>>>>>> point here and it would be better to have a >>>>>>>>>>>>>>> ListConfigOption >>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I >>>>>>>>>> understood >>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>>>>>>> parameter >>>>>>>>>> of >>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>>>>>>> ConfigOption >>>>>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>> more what did you mean? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dawid >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do >>> early >>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad >>>>>>>>>>>>>>>> hoc, >>>>>>>>>>>>>>>> though. For >>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear >>>>>>>>>>>>>>>> indication of >>> not >>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>>>>> Also, having a more clear category system makes validation >>>>>>>> simpler. >>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between >>> numeric >>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible >>>>>>>>>>>>>>>> values), >>>>>>>>>>>>>>>> quantities >>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>> results in >>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>>>>> lzljs3620...@aliyun.com >>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of >>>>>>>>>>>>>>>>> various >>>>>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>>>>>>> unified >>>>>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, >>>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The >>>>>>>>>>>>>>>>> skew may >>>>>>>>>>>>>>>>> be a single field or a combination of multiple >>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>> So the >>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON >>>>>>>>>>>>>>>>> string >>> to >>>>>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>>> From:Jark Wu <imj...@gmail.com> >>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>>>>> To:dev <dev@flink.apache.org> >>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>>>>> Configuration >>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a >>> long >>>>>>>>>> time. >>>>>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>>>>> configurations and >>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed >>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < >>> wander4...@gmail.com >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of >>>>>>>>>>>>>>>>>> Flink. >>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年8月16日周五 >>>>>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, >>>>>>>>>>>>>>>>>>> with >>> the >>>>>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and >>>>>>>>>>>>>>>>>>> formats >>>>>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose >>>>>>>>>>>>>>>>>>> FLIP-54 for >>>>>>>>>>>>>>>>>>> evolving >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>> >>>>>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>> >
signature.asc
Description: OpenPGP digital signature