Hi, Regarding the factory and duplicate() and whatnot, wouldn’t it work to have a factory like this:
/** * Allows to read and write an instance from and to {@link Configuration}. A configurable instance * operates in its own key space in {@link Configuration} and will be (de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor. * */ public interface ConfigurableFactory<T extends Configurable> { /** * Creates an instance from the given configuration. */ T fromConfiguration(ConfigurationReader configuration); } with Configurable being: /** * Allows to read and write an instance from and to {@link Configuration}. A configurable instance * operates in its own key space in {@link Configuration} and will be (de)prefixed by the framework. It cannot access keys from other options. A factory must have a default constructor. * */ public interface Configurable { /** * Writes this instance to the given configuration. */ void writeToConfiguration(ConfigurationWriter configuration); // method name TBD } This would make the Configurable immutable and we wouldn’t need a duplicate() method. Best, Aljoscha > On 2. Sep 2019, at 14:40, Becket Qin <becket....@gmail.com> wrote: > > Hi Timo and Dawid, > > Thanks for the patient reply. I agree that both option a) and option b) can > solve the mutability problem. > > For option a), is it a little intrusive to add a duplicate() method for a > Configurable? It would be great if we don't put this burden on users if > possible. > > For option b), I actually feel it is slightly better than option a) from > API perspective as getFactory() seems a more understandable method of a > Configurable compared with duplicate(). And users do not need to implement > much more logic. > > I am curious what is the downside of keeping the Configuration simple to > only have primitive types, and always create the Configurable using a util > method? If Configurables are rare, do we need to put the instantiation / > bookkeeping logic in Configuration? > > @Becket for the toConfiguration this is required for shipping the >> Configuration to TaskManager, so that we do not have to use java >> serializability. > > Do you mean a Configurable may be created and configured directly without > reading settings from a Configuration instance? I thought a Configurable > will always be created via a ConfigurableFactory by extracting required > configs from a Configuration. Am I missing something? > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <dwysakow...@apache.org> > wrote: > >> Hi Timo, Becket >> >> From the options that Timo suggested for improving the mutability >> situation I would prefer option a) as this is the more explicit option >> and simpler option. Just as a remark, I think in general Configurable >> types for options will be rather very rare for some special use-cases, >> as the majority of options are rather simpler parameters of primitive >> types (+duration, memory) >> >> @Becket for the toConfiguration this is required for shipping the >> Configuration to TaskManager, so that we do not have to use java >> serializability. >> >> Best, >> >> Dawid >> >> >> On 02/09/2019 10:05, Timo Walther wrote: >>> Hi Becket, >>> >>> Re 1 & 3: "values in configurations should actually be immutable" >>> >>> I would also prefer immutability but most of our configuration is >>> mutable due to serialization/deserialization. Also maps and list could >>> be mutable in theory. It is difficult to really enforce that for >>> nested structures. I see two options: >>> >>> a) For the original design: How about we force implementers to add a >>> duplicate() method in a Configurable object? This would mean that the >>> object is still mutable but by duplicating the object both during >>> reading and writing we would avoid the problem you described. >>> >>> b) For the current design: We still use the factory approach but let a >>> Configurable object implement a getFactory() method such that we know >>> how to serialize the object. With the help of a factory we can also >>> duplicate the object easily during reading and writing and ensure >>> immutability. >>> >>> I would personally go for approach a) to not over-engineer this topic. >>> But I'm open for option b). >>> >>> Regards, >>> Timo >>> >>> >>> On 31.08.19 04:09, Becket Qin wrote: >>>> Hi Timo, >>>> >>>> Thanks for the reply. I am still a little concerned over the >>>> mutability of >>>> the Configurable which could be the value in Configuration. >>>> >>>> Re: 1 >>>> >>>>> But in general, people should not use any internal fields. >>>>> Configurable objects are meant for simple little helper POJOs, not >>>>> complex arbitrary nested data structures. >>>> This seems difficult to enforce... Ideally the values in configurations >>>> should actually be immutable. The value can only be changed by >>>> explicitly >>>> calling setters in Configuration. Otherwise we may have weird situation >>>> where the Configurable in the same configuration are different in two >>>> places because the configurable is modified in one places and not >>>> modified >>>> in another place. So I am a little concerned on putting a >>>> Configurable type >>>> in the Configuration map, because the value could be modified without >>>> explicitly setting the configuration. For example, can users do the >>>> following? >>>> >>>> Configurable configurable = >>>> configuration.getConfigurable(myConfigurableOption); >>>> configurable.setConfigA(123); // this already changes the configurable >>>> object in the configuration. >>>> >>>> Re: 2 >>>> Thanks for confirming. As long as users will not have a situation where >>>> they need to set two configurations with the same key but different >>>> descriptions, I think it is OK. >>>> >>>> Re: 3 >>>> This is actually kind of related to 1. i.e. Whether toConfiguration() >>>> guarantees the exact same object can be rebuilt from the >>>> configuration or >>>> not. I am still not quite sure about the use case of toConfiguration() >>>> though. It seems indicating the Configurable is mutable, which might be >>>> dangerous. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <twal...@apache.org> >>>> wrote: >>>> >>>>> Hi Becket, >>>>> >>>>> 1. First of all, you are totally right. The FLIP contains a bug due to >>>>> the last minute changes that Dawid suggested: by having immutable >>>>> objects created by a factory we loose the serializability of the >>>>> Configuration because the factory itself is not stored in the >>>>> Configuration. I would propose to revert the last change and stick to >>>>> the original design, which means that a object must implement the >>>>> Configurable interface and also implements >>>>> serialization/deserialization >>>>> methods such that also internal fields can be persisted as you >>>>> suggested. But in general, people should not use any internal fields. >>>>> Configurable objects are meant for simple little helper POJOs, not >>>>> complex arbitrary nested data structures. >>>>> >>>>> It is Map<String, Object> because Configuration stores the raw objects. >>>>> If you put a Boolean option into it, it remains Boolean. This makes the >>>>> map very efficient for shipping to the cluster and accessing options >>>>> multiple times. The same for configurable objects. We put the pure >>>>> objects into the map without any serialization/deserialization. The >>>>> provided factory allows to convert the Object into a Configuration and >>>>> we know how to serialize/deserializise a configuration because it is >>>>> just a key/value map. >>>>> >>>>> 2. Yes, this is what we had in mind. It should still be the same >>>>> configuration option. We would like to avoid specialized option keys >>>>> across components (exec.max-para and table.exec.max-para) if they are >>>>> describing basically the same thing. But adding some more description >>>>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2" >>>>> does not hurt. >>>>> >>>>> 3. They should restore the original object given that the >>>>> toConfiguration/fromConfiguration methods have been implemented >>>>> correctly. I will extend the example to make the logic clearer while >>>>> fixing the bug. >>>>> >>>>> Thanks for the healthy discussion, >>>>> Timo >>>>> >>>>> >>>>> On 30.08.19 15:29, Becket Qin wrote: >>>>>> Hi Timo, >>>>>> >>>>>> Thanks again for the clarification. Please see a few more questions >>>>> below. >>>>>> Re: 1 >>>>>> >>>>>>> Please also keep in mind that Configuration must not consist of only >>>>>>> strings, it manages a Map<String, Object> for efficient access. Every >>>>>>> map entry can have a string representation for persistence, but in >>>>>>> most >>>>>>> cases consists of unserialized objects. >>>>>> I'd like to understand this a bit more. The reason we have a >>>>>> Map<String, >>>>>> Object> in Configuration was because that Object could be either a >>>>> String, >>>>>> a List or a Map, right? But they eventually always boil down to >>>>>> Strings, >>>>> or >>>>>> maybe one of the predefined type that we know how to serialize. In the >>>>>> current design, can the Object also be Configurable? >>>>>> If the value in the config Map<String, Object> can be Configurable >>>>> objects, >>>>>> how do we serialize them? Calling toConfiguration() seems not quite >>>>> working >>>>>> because there might be some other internal fields that are not part of >>>>> the >>>>>> configuration. The modification to those fields will be lost if we >>>>>> simply >>>>>> use toConfiguration(). So the impact of modifying those Configurable >>>>>> objects seems a little undefined. And it would be difficult to prevent >>>>>> users from doing that. >>>>>> If the value in the config Map<String, Object> cannot be Configurable >>>>>> objects, then it seems a little weird to have all the other ConfigType >>>>>> stored in the ConfigMap in their own native type and accessed via >>>>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be >>>>>> different from others because one have to use ConfigurableFactory >>>>>> to get >>>>>> the configurations. >>>>>> >>>>>> Re: 2 >>>>>> >>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>> different place, so that the option is easier to find in a different >>>>>>> module from it was defined. >>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>> equal >>>>> to: >>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>> } >>>>>> Just to make sure I understand it correctly, does that mean users will >>>>> see >>>>>> something like following? >>>>>> - CoreOptions.MAX_PARALLELISM with description_1; >>>>>> - TableOptions.MAX_PARALLELISM with description_1 + description_2. >>>>>> - DataStreamOptions.MAX_PARALLELISM with description_1 + >>>>>> description_3. >>>>>> And users will only configure exactly one MAX_PARALLELISM cross the >>>>> board. >>>>>> So they won't be confused by setting two MAX_PARALLELISM config for >>>>>> two >>>>>> different modules, while they are actually the same config. If that is >>>>> the >>>>>> case, I don't have further concern. >>>>>> >>>>>> Re: 3 >>>>>> Maybe I am thinking too much. I thought toBytes() / fromBytes() >>>>>> actually >>>>>> restore the original Object. But fromConfiguration() and >>>>> toConfiguration() >>>>>> does not do that, anything not in the configuration of the original >>>>> object >>>>>> will be lost. So it would be good to make that clear. Maybe a clear >>>>>> Java >>>>>> doc is sufficient. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz >>>>>> <dwysakow...@apache.org >>>>>> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Ad. 1 >>>>>>> >>>>>>> The advantage of our approach is that you have the type definition >>>>>>> close >>>>>>> to the option definition. The only difference is that it enables >>>>>>> expressing simple pojos with the primitives like >>>>>>> ConfigOption<Integer>, >>>>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having >>>>>>> >>>>>>> the parsing logic scattered everywhere in the code base as it is now. >>>>>>> The string representation in our proposal is exactly the same as you >>>>>>> explained for those three options. The only difference is that you >>>>>>> don't >>>>>>> have to parse the elements of a List, Map etc. afterwards. >>>>>>> >>>>>>> Ad. 2 >>>>>>> >>>>>>> I think about the withExtendedDescription as a helper getter in a >>>>>>> different place, so that the option is easier to find in a different >>>>>>> module from it was defined. >>>>>>> >>>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be >>>>>>> equal >>>>> to: >>>>>>> public ConfigOption getMaxParallelismOption() { >>>>>>> >>>>>>> return CoreOptions.MAX_PARALLELISM; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> This allows to further clarify the description of the option in the >>>>>>> context of a different module and end up in a seperate page in >>>>>>> documentation (but with a link to the original one). In the end it is >>>>>>> exactly the same option. It has exactly same key, type, parsing >>>>>>> logic, >>>>>>> it is in the end forwarded to the same place. >>>>>>> >>>>>>> Ad. 3 >>>>>>> >>>>>>> Not sure if I understand your concerns here. As Timo said it is in >>>>>>> the >>>>>>> end sth similar to toBytes/fromBytes, but it puts itself to a >>>>>>> Configuration. Also just wanted to make sure we adjusted this part >>>>>>> slightly and now the ConfigOption takes ConfigurableFactory. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Dawid >>>>>>> >>>>>>> >>>>>>> On 30/08/2019 09:39, Timo Walther wrote: >>>>>>>> Hi Becket, >>>>>>>> >>>>>>>> thanks for the discussion. >>>>>>>> >>>>>>>> 1. ConfigOptions in their current design are bound to classes. >>>>>>>> Regarding, the option is "creating some Configurable objects instead >>>>>>>> of defining the config to create >>>>>>>> those Configurable"? We just moved this logic to a factory, this >>>>>>>> factory can then also be used for other purposes. However, how the >>>>>>>> option and objects are serialized to Configuration is still not part >>>>>>>> of the option. The option is just pure declaration. >>>>>>>> >>>>>>>> If we would allow only List<String>, implementers would need to >>>>>>>> start >>>>>>>> implementing own parsing and validation logic all the time. We would >>>>>>>> like to avoid that. >>>>>>>> >>>>>>>> Please also keep in mind that Configuration must not consist of only >>>>>>>> strings, it manages a Map<String, Object> for efficient access. >>>>>>>> Every >>>>>>>> map entry can have a string representation for persistence, but in >>>>>>>> most cases consists of unserialized objects. >>>>>>>> >>>>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite >>>>>>>> keys, types or default values. But different layers might want to >>>>>>>> add >>>>>>>> helpful information. In our concrete use case for FLIP-59, >>>>>>>> ExecutionConfig has 50 properties and many of them are not relevant >>>>>>>> for the Table layer or have no effect at all. We would like to list >>>>>>>> and mention the most important config options again in the Table >>>>>>>> Configuration section, so that users are not confused, but with a >>>>>>>> strong link to the core option. E.g.: registered kryo serializers >>>>>>>> are >>>>>>>> also important also for Table users, we would like to add the >>>>>>>> comment >>>>>>>> "This option allows to modify the serialization of the ANY SQL data >>>>>>>> type.". I think we should not spam the core configuration page with >>>>>>>> comments from all layers, connectors, or libraries but keep this in >>>>>>>> the corresponding component documentation. >>>>>>>> >>>>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes >>>>>>>> and deserializes T from a configuration? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Timo >>>>>>>> >>>>>>>> On 29.08.19 19:14, Becket Qin wrote: >>>>>>>>> Hi Timo and Stephan, >>>>>>>>> >>>>>>>>> Thanks for the detail explanation. >>>>>>>>> >>>>>>>>> 1. I agree that each config should be in a human readable >>>>>>>>> format. My >>>>>>>>> concern is that the current List<Configurable> looks going a little >>>>>>>>> too far >>>>>>>>> from what the configuration is supposed to do. They are essentially >>>>>>>>> creating some Configurable objects instead of defining the >>>>>>>>> config to >>>>>>>>> create >>>>>>>>> those Configurable. This mixes ConfigOption and the usage of it. >>>>>>>>> API >>>>>>>>> wise >>>>>>>>> it would be good to keep the configs and their usages (such as >>>>>>>>> how to >>>>>>>>> create objects using the ConfigOption) apart from each other. >>>>>>>>> I am wondering if we can just make List also only take string. For >>>>>>>>> example, >>>>>>>>> is the following definition of map and list sufficient? >>>>>>>>> >>>>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be >>>>>>>>> defined >>>>>>>>> as: >>>>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ... >>>>>>>>> >>>>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined >>>>> as: >>>>>>>>> list_config_name: v1, v2, v3, ... >>>>>>>>> >>>>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String, >>>>>>>>> String>>. It >>>>>>>>> can >>>>>>>>> be defined as: >>>>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;.... >>>>>>>>> >>>>>>>>> All the key and values in the configuration are String. This also >>>>>>>>> guarantees that the configuration is always serializable. >>>>>>>>> If we want to do one more step, we can allow the ConfigOption to >>>>>>>>> set >>>>> all >>>>>>>>> the primitive types and parse that for the users. So something like >>>>>>>>> List<Integer>, List<Class<?>> seems fine. >>>>>>>>> >>>>>>>>> The configuration class could also have util methods to create a >>>>>>>>> list >>>>> of >>>>>>>>> configurable such as: >>>>>>>>> <T> List<T> >>>>> <Configuration#getConfigurableInstances(ListMapConfigOption, >>>>>>>>> Class<T> clazz). >>>>>>>>> But the configuration class will not take arbitrary Configurable as >>>>> the >>>>>>>>> value of its config. >>>>>>>>> >>>>>>>>> 2. I might have misunderstood this. But my concern on the >>>>>>>>> description >>>>>>>>> extension is in the following example. >>>>>>>>> >>>>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM = >>>>>>>>> >>>>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription( >>>>>>>>> "Note: That this property means that a table program has a >>>>>>>>> side-effect >>>>>>>>> XYZ."); >>>>>>>>> >>>>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is >>>>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I >>>>>>>>> suppose >>>>>>>>> users >>>>>>>>> will see both configurations. One with an extended description >>>>>>>>> and one >>>>>>>>> without. Let's say there is a third component which also users >>>>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM >>>>>>>>> ConfigOption? If >>>>>>>>> so, what would that ConfigOption's description look like? >>>>>>>>> >>>>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM >>>>>>>>> and the >>>>>>>>> description should clearly state all the usage of this >>>>>>>>> ConfigOption. >>>>>>>>> >>>>>>>>> 3. I see, in that case, how about we name it something like >>>>>>>>> extractConfiguration()? I am just trying to see if we can make it >>>>> clear >>>>>>>>> this is not something like fromBytes() and toBytes(). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>> >>>>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <twal...@apache.org> >>>>>>> wrote: >>>>>>>>>> Hi Becket, >>>>>>>>>> >>>>>>>>>> let me try to clarify some of your questions: >>>>>>>>>> >>>>>>>>>> 1. For every option, we also needed to think about how to >>>>>>>>>> represent >>>>> it >>>>>>>>>> in a human readable format. We do not want to allow arbitrary >>>>>>>>>> nesting >>>>>>>>>> because that would easily allow to bypass the flattened >>>>>>>>>> hierarchy of >>>>>>>>>> config options (`session.memory.min`). The current design >>>>>>>>>> allows to >>>>>>>>>> represent every option type as a list. E.g.: >>>>>>>>>> >>>>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12` >>>>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption: >>>>>>>>>> field=12,other=true; field=12,other=true` >>>>>>>>>> `myPropertyOption: key=str0,other=str1` can be >>>>>>>>>> `myPropertyListOption: >>>>>>>>>> key=str0,other=str1;key=str0,other=str1` >>>>>>>>>> >>>>>>>>>> We need the atomic class for serialization/deserialization both in >>>>>>>>>> binary and string format. >>>>>>>>>> >>>>>>>>>> ConfigOption<List> is not present in the code base yet, but this >>>>>>>>>> FLIP is >>>>>>>>>> a preparation of making ExecutionConfig configurable. If you look >>>>> into >>>>>>>>>> this class or also in existing table connectors/formats, you >>>>>>>>>> will see >>>>>>>>>> that each proposed option type has its requirements. >>>>>>>>>> >>>>>>>>>> 2. Regarding extending the description of ConfigOptions, the >>>>>>>>>> semantic of >>>>>>>>>> one option should be a super set of the other option. E.g. in >>>>>>>>>> Table >>>>> API >>>>>>>>>> we might use general ExecutionConfig properties. But we would like >>>>>>>>>> to a) >>>>>>>>>> make external options more prominent in the Table API config >>>>>>>>>> docs to >>>>>>>>>> link people to properties they should pay attention b) notice >>>>>>>>>> about >>>>>>>>>> side >>>>>>>>>> effects. The core semantic of a property should not change. >>>>>>>>>> >>>>>>>>>> 3. The factory will not receive the entire configuration but works >>>>> in a >>>>>>>>>> separate key space. For `myObjectOption` above, it would receive a >>>>>>>>>> configuration that consists of `field: 12` and `other: true`. >>>>>>>>>> >>>>>>>>>> I agree. I will convert the document into a Wiki page today. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Timo >>>>>>>>>> >>>>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote: >>>>>>>>>>> @Becket One thing that may be non-obvious is that the >>>>>>>>>>> Configuration >>>>>>>>>>> class >>>>>>>>>>> also defines serialization / persistence logic at the moment. >>>>>>>>>>> So it >>>>>>>>>>> needs >>>>>>>>>>> to know the set of types it supports. That stands in the way >>>>>>>>>>> of an >>>>>>>>>>> arbitrary generic map type. >>>>>>>>>>> >>>>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one >>>>>>>>>>> collection orthogonal to the type (List) and another one bound to >>>>>>>>>> specific >>>>>>>>>>> types (Map). >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <becket....@gmail.com >>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Timo, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I >>>>>>>>>>>> have a >>>>>>>>>>>> few >>>>>>>>>>>> questions / comments. >>>>>>>>>>>> >>>>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption? >>>>>>>>>>>> Would it be enough to just check the atomicClass to see if it >>>>>>>>>>>> is a >>>>>>>>>>>> List >>>>>>>>>> or >>>>>>>>>>>> not? >>>>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume >>>>>>>>>>>> both key >>>>>>>>>>>> and value types are String? Can we just apply the same to the >>>>>>>>>>>> ConfigOption<List>? >>>>>>>>>>>> BTW, I did a quick search in the codebase but did not find any >>>>>>>>>>>> usage of >>>>>>>>>>>> ConfigOption<List>. >>>>>>>>>>>> >>>>>>>>>>>> 2. The same config name, but with two ConfigOption with >>>>>>>>>>>> different >>>>>>>>>> semantic >>>>>>>>>>>> in different component seems super confusing. For example, when >>>>> users >>>>>>>>>> set >>>>>>>>>>>> both configs, they may have no idea one is overriding the other. >>>>>>>>>>>> There >>>>>>>>>>>> might be two cases: >>>>>>>>>>>> - If it is just the same config used by different >>>>>>>>>>>> components to >>>>>>>>>>>> act >>>>>>>>>>>> accordingly, it might be better to just have one config, but >>>>> describe >>>>>>>>>>>> clearly on how that config will be used. >>>>>>>>>>>> - If it is actually two configurations that can be set >>>>>>>>>>>> differently, I >>>>>>>>>>>> think the config names should just be different. >>>>>>>>>>>> >>>>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration() >>>>> method >>>>>>>>>>>> pretty much means getConfiguration()? The toConfiguration() >>>>>>>>>>>> method >>>>>>>>>> sounds >>>>>>>>>>>> like converting an object to a configuration, which only >>>>>>>>>>>> works if >>>>> the >>>>>>>>>>>> object does not contain any state / value. I am also >>>>>>>>>>>> wondering if >>>>>>>>>>>> there >>>>>>>>>> is >>>>>>>>>>>> a real use case of this method. Because supposedly the >>>>> configurations >>>>>>>>>> could >>>>>>>>>>>> just be passed around to caller of this method. >>>>>>>>>>>> >>>>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of >>>>>>>>>>>> in the >>>>>>>>>> Google >>>>>>>>>>>> doc before voting? The FLIP wiki allows track the modification >>>>>>>>>>>> history >>>>>>>>>> and >>>>>>>>>>>> has a more established structure to ensure nothing is missed. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther >>>>>>>>>>>> <twal...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the >>>>> voting >>>>>>>>>>>>> thread. If there are no objections, I will start a new voting >>>>> thread >>>>>>>>>>>>> tomorrow at 9am Berlin time. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote: >>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks for all the feedback we have received online and >>>>>>>>>>>>>> offline. >>>>> It >>>>>>>>>>>>>> showed that many people support the idea of evolving the Flink >>>>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP >>>>>>>>>>>>>> will >>>>>>>>>>>>>> not >>>>>>>>>>>>>> solve all issues but at least will improve the current status. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We've updated the document and replaced the Correlation part >>>>>>>>>>>>>> with the >>>>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available >>>>>>>>>>>>>> options >>>>>>>>>>>>>> of a group plus custom group validators for eager validation. >>>>>>>>>>>>>> For now, >>>>>>>>>>>>>> this eager group validation will only be used at certain >>>>>>>>>>>>>> locations in >>>>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire >>>>>>>>>>>>>> global >>>>>>>>>>>>>> configuration before submitting a job in the future. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please take another look if you find time. I hope we can >>>>>>>>>>>>>> proceed >>>>>>>>>>>>>> with >>>>>>>>>>>>>> the voting process if there are no objections. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther: >>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thanks for your suggestions. Let me give you some background >>>>> about >>>>>>>>>>>>>>> the decisions made in this FLIP: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" >>>>>>>>>>>>>>> because we >>>>> did >>>>>>>>>>>>>>> not want to change the entire configuration infrastructure. >>>>>>>>>>>>>>> Both for >>>>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that >>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>> required to update all options. If our goal is to rework the >>>>>>>>>>>>>>> configuration option entirely, I might suggest to switch >>>>>>>>>>>>>>> to JSON >>>>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting >>>>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the >>>>> more >>>>>>>>>>>>>>> nested structures are allowed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is >>>>>>>>>>>>>>> centered >>>>>>>>>>>>>>> around Java classes where T is determined by the default >>>>>>>>>>>>>>> value. >>>>>>>>>>>>>>> The >>>>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit >>>>>>>>>>>>>>> `intType()` method etc. The current design of validators >>>>> centered >>>>>>>>>>>>>>> around Java classes makes it possible to have typical domain >>>>>>>>>>>>>>> validators baked by generics as you suggested. If we >>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>> types >>>>>>>>>>>>>>> such as "quantity with measure and unit" we still need to >>>>>>>>>>>>>>> get a >>>>>>>>>>>>>>> class >>>>>>>>>>>>>>> out of this option at the end, so why changing a proven >>>>>>>>>>>>>>> concept? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary >>>>>>>>>>>>>>> nesting. As >>>>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every >>>>>>>>>>>>>>> atomic >>>>>>>>>>>>>>> option like "key=12" can be represented by a list >>>>>>>>>>>>>>> "keys=12;13". >>>>>>>>>>>>>>> But >>>>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>> option >>>>>>>>>>>>>>> would start making this more complicated such as >>>>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...), >>>>>>>>>>>>>>> StringOption(...)))", do we want that? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 4. Correlation: The correlation part is one of the >>>>>>>>>>>>>>> suggestions >>>>>>>>>>>>>>> that I >>>>>>>>>>>>>>> like least in the document. We can also discuss removing it >>>>>>>>>>>>>>> entirely, >>>>>>>>>>>>>>> but I think it solves the use case of relating options >>>>>>>>>>>>>>> with each >>>>>>>>>>>>>>> other in a flexible way right next to the actual option. >>>>>>>>>>>>>>> Instead of >>>>>>>>>>>>>>> being hidden in some component initialization, we should >>>>>>>>>>>>>>> put it >>>>>>>>>>>>>>> close >>>>>>>>>>>>>>> to the option to also perform validation eagerly instead of >>>>>>>>>>>>>>> failing >>>>>>>>>>>>>>> at runtime when the option is accessed the first time. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen: >>>>>>>>>>>>>>>> A "List Type" sounds like a good direction to me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The >>>>>>>>>>>>>>>> idea is >>>>>>>>>>>>>>>> to see >>>>>>>>>>>>>>>> if something like that can ease validation. Especially the >>>>>>>>>>>> correlation >>>>>>>>>>>>>>>> system seems quite complex (proxies to work around order of >>>>>>>>>>>>>>>> initialization). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For example, let's assume we don't think primarily about >>>>>>>>>>>>>>>> "java >>>>>>>>>>>>>>>> types" but >>>>>>>>>>>>>>>> would define types as one of the following (just examples, >>>>>>>>>>>>>>>> haven't >>>>>>>>>>>>>>>> thought >>>>>>>>>>>>>>>> all the details through): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (a) category type: implies string, and a fix set of >>>>> possible >>>>>>>>>>>> values. >>>>>>>>>>>>>>>> Those would be passes and naturally make it into the docs >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>> Maps to a String or Enum in Java. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (b) numeric integer type: implies long (or optionally >>>>>>>>>>>>>>>> integer, >>>>>>>>>> if >>>>>>>>>>>>>>>> we want >>>>>>>>>>>>>>>> to automatically check overflow / underflow). would take >>>>> typical >>>>>>>>>>>> domain >>>>>>>>>>>>>>>> validators, like non-negative, etc. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (c) numeric real type: same as above (double or >>>>>>>>>>>>>>>> float) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (d) numeric interval type: either defined as an >>>>> interval, or >>>>>>>>>>>>>>>> references >>>>>>>>>>>>>>>> other parameter by key. validation by valid interval. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (e) quantity: a measure and a unit. separately >>>>>>>>>>>>>>>> parsable. >>>>> The >>>>>>>>>>>>>>>> measure's >>>>>>>>>>>>>>>> type could be any of the numeric types above, with same >>>>>>>>>>>>>>>> validation >>>>>>>>>>>>>>>> rules. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> With a system like the above, would we still correlation >>>>>>>>>>>>>>>> validators? >>>>>>>>>>>>>>>> Are >>>>>>>>>>>>>>>> there still cases that we need to catch early (config >>>>>>>>>>>>>>>> loading) >>>>> or >>>>>>>>>>>>>>>> are the >>>>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup >>>>> specific, >>>>>>>>>>>>>>>> that it is >>>>>>>>>>>>>>>> fine to handle them in component initialization? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz >>>>>>>>>>>>>>>> <dwysakow...@apache.org> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Stephan, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you for your opinion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the >>>>>>>>>>>>>>>>> most of >>>>>>>>>>>> the >>>>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown >>>>> type >>>>>>>>>>>>>>>>> system, >>>>>>>>>>>>>>>>> a field like isList may look weird. Please let me >>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>> bit >>>>>>>>>>>> more >>>>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear >>>>>>>>>>>>>>>>> enough >>>>>>>>>>>>>>>>> about >>>>>>>>>>>> it >>>>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is >>>>>>>>>>>>>>>>> that they >>>>>>>>>>>>>>>>> must >>>>>>>>>>>>>>>>> have a string representation as they might come from a >>>>>>>>>> configuration >>>>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so >>>>>>>>>>>>>>>>> that the >>>>>>>>>>>> values >>>>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not >>>>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>>>> add a >>>>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for >>>>>>>>>>>>>>>>> lists >>>>>>>>>>>> only >>>>>>>>>>>>>>>>> (and flat objects - I think though in the current design >>>>>>>>>>>>>>>>> there is a >>>>>>>>>>>>>>>>> mistake around the Configurable interface). I think though >>>>>>>>>>>>>>>>> you have >>>>>>>>>>>> a >>>>>>>>>>>>>>>>> point here and it would be better to have a >>>>>>>>>>>>>>>>> ListConfigOption >>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>> this field. Does it make sense to you? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I >>>>>>>>>>>> understood >>>>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from >>>>>>>>>>>>>>>>> Configuration that means they can be bound to the generic >>>>>>>>>>>>>>>>> parameter >>>>>>>>>>>> of >>>>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends >>>>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the >>>>>>>>>>>>>>>>> ConfigOption >>>>>>>>>>>>>>>>> has anything to do with the validation logic. Could you >>>>>>>>>>>>>>>>> elaborate a >>>>>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>>> more what did you mean? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dawid >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote: >>>>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do >>>>> early >>>>>>>>>>>>>>>>> validation. >>>>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad >>>>>>>>>>>>>>>>>> hoc, >>>>>>>>>>>>>>>>>> though. For >>>>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear >>>>>>>>>>>>>>>>>> indication of >>>>> not >>>>>>>>>>>>>>>>>> having >>>>>>>>>>>>>>>>>> thought through the type/category system. >>>>>>>>>>>>>>>>>> Also, having a more clear category system makes validation >>>>>>>>>> simpler. >>>>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between >>>>> numeric >>>>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible >>>>>>>>>>>>>>>>>> values), >>>>>>>>>>>>>>>>>> quantities >>>>>>>>>>>>>>>>>> like duration and memory size (need measure and unit), >>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>> results in >>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>> elegant system for validation. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee < >>>>>>>>>>>> lzljs3620...@aliyun.com >>>>>>>>>>>>>>>>> .invalid> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design. >>>>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of >>>>>>>>>>>>>>>>>>> various >>>>>>>>>>>>>>>>>>> modules to be unified. >>>>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new >>>>>>>>>>>>>>>>>>> unified >>>>>>>>>>>>>>>>>>> TableEnvironment available for production. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations, >>>>>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The >>>>>>>>>>>>>>>>>>> skew may >>>>>>>>>>>>>>>>>>> be a single field or a combination of multiple >>>>>>>>>>>>>>>>>>> fields. >>>>>>>>>>>>>>>>>>> So the >>>>>>>>>>>>>>>>>>> configuration is very troublesome. We used JSON >>>>>>>>>>>>>>>>>>> string >>>>> to >>>>>>>>>>>>>>>>>>> configure it. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Jingsong Lee >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------ >>>>>>>>>>>>>>>>>>> From:Jark Wu <imj...@gmail.com> >>>>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44 >>>>>>>>>>>>>>>>>>> To:dev <dev@flink.apache.org> >>>>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and >>>>>>>>>>>> Configuration >>>>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a >>>>> long >>>>>>>>>>>> time. >>>>>>>>>>>>>>>>>>> We have seen the benefit when developing blink >>>>>>>>>>>>>>>>>>> configurations and >>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>> properties in 1.9 release. >>>>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed >>>>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>> I will leave my thoughts and comments there. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen < >>>>> wander4...@gmail.com >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Timo, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which >>>>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of >>>>>>>>>>>>>>>>>>>> Flink. >>>>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment >>>>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by >>>>>>>>>>>> configuration. >>>>>>>>>>>>>>>>>>>> Will take a look at the document in days. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> tison. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年8月16日周五 >>>>>>>>>>>>>>>>>>>> 下午10:12写道: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of >>>>>>>>>>>>>>>>>>>>> ExecutionConfig and >>>>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is >>>>>>>>>>>>>>>>>>>>> necessary >>>>>>>>>>>>>>>>>>>>> to make >>>>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally, >>>>>>>>>>>>>>>>>>>>> with >>>>> the >>>>>>>>>>>>>>>>>>>>> new SQL >>>>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and >>>>>>>>>>>>>>>>>>>>> formats >>>>>>>>>>>>>>>>>>>>> coming up, >>>>>>>>>>>>>>>>>>>>> unified configuration becomes more important. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We need more features around string-based configuration >>>>>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>>>> future, >>>>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose >>>>>>>>>>>>>>>>>>>>> FLIP-54 for >>>>>>>>>>>>>>>>>>>>> evolving >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit >>>>> >>>>>>>>>>>>>>>>>>>>> In summary it adds: >>>>>>>>>>>>>>>>>>>>> - documented types and validation >>>>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list >>>>>>>>>>>>>>>>>>>>> - simple non-nested object types >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback, >>>>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >>> >> >>