Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

Dawid Wysakowicz Mon, 02 Sep 2019 01:48:08 -0700

Hi Timo, Becket

From the options that Timo suggested for improving the mutability
situation I would prefer option a) as this is the more explicit option
and simpler option. Just as a remark, I think in general Configurable
types for options will be rather very rare for some special use-cases,
as the majority of options are rather simpler parameters of primitive
types (+duration, memory)


@Becket for the toConfiguration this is required for shipping the
Configuration to TaskManager, so that we do not have to use java
serializability.

Best,

Dawid


On 02/09/2019 10:05, Timo Walther wrote:
> Hi Becket,
>
> Re 1 & 3: "values in configurations should actually be immutable"
>
> I would also prefer immutability but most of our configuration is
> mutable due to serialization/deserialization. Also maps and list could
> be mutable in theory. It is difficult to really enforce that for
> nested structures. I see two options:
>
> a) For the original design: How about we force implementers to add a
> duplicate() method in a Configurable object? This would mean that the
> object is still mutable but by duplicating the object both during
> reading and writing we would avoid the problem you described.
>
> b) For the current design: We still use the factory approach but let a
> Configurable object implement a getFactory() method such that we know
> how to serialize the object. With the help of a factory we can also
> duplicate the object easily during reading and writing and ensure
> immutability.
>
> I would personally go for approach a) to not over-engineer this topic.
> But I'm open for option b).
>
> Regards,
> Timo
>
>
> On 31.08.19 04:09, Becket Qin wrote:
>> Hi Timo,
>>
>> Thanks for the reply. I am still a little concerned over the
>> mutability of
>> the Configurable which could be the value in Configuration.
>>
>> Re: 1
>>
>>> But in general, people should not use any internal fields.
>>> Configurable objects are meant for simple little helper POJOs, not
>>> complex arbitrary nested data structures.
>> This seems difficult to enforce... Ideally the values in configurations
>> should actually be immutable. The value can only be changed by
>> explicitly
>> calling setters in Configuration. Otherwise we may have weird situation
>> where the Configurable in the same configuration are different in two
>> places because the configurable is modified in one places and not
>> modified
>> in another place. So I am a little concerned on putting a
>> Configurable type
>> in the Configuration map, because the value could be modified without
>> explicitly setting the configuration. For example, can users do the
>> following?
>>
>> Configurable configurable =
>> configuration.getConfigurable(myConfigurableOption);
>> configurable.setConfigA(123); // this already changes the configurable
>> object in the configuration.
>>
>> Re: 2
>> Thanks for confirming. As long as users will not have a situation where
>> they need to set two configurations with the same key but different
>> descriptions, I think it is OK.
>>
>> Re: 3
>> This is actually kind of related to 1. i.e. Whether toConfiguration()
>> guarantees the exact same object can be rebuilt from the
>> configuration or
>> not. I am still not quite sure about the use case of toConfiguration()
>> though. It seems indicating the Configurable is mutable, which might be
>> dangerous.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <twal...@apache.org>
>> wrote:
>>
>>> Hi Becket,
>>>
>>> 1. First of all, you are totally right. The FLIP contains a bug due to
>>> the last minute changes that Dawid suggested: by having immutable
>>> objects created by a factory we loose the serializability of the
>>> Configuration because the factory itself is not stored in the
>>> Configuration. I would propose to revert the last change and stick to
>>> the original design, which means that a object must implement the
>>> Configurable interface and also implements
>>> serialization/deserialization
>>> methods such that also internal fields can be persisted as you
>>> suggested. But in general, people should not use any internal fields.
>>> Configurable objects are meant for simple little helper POJOs, not
>>> complex arbitrary nested data structures.
>>>
>>> It is Map<String, Object> because Configuration stores the raw objects.
>>> If you put a Boolean option into it, it remains Boolean. This makes the
>>> map very efficient for shipping to the cluster and accessing options
>>> multiple times. The same for configurable objects. We put the pure
>>> objects into the map without any serialization/deserialization. The
>>> provided factory allows to convert the Object into a Configuration and
>>> we know how to serialize/deserializise a configuration because it is
>>> just a key/value map.
>>>
>>> 2. Yes, this is what we had in mind. It should still be the same
>>> configuration option. We would like to avoid specialized option keys
>>> across components (exec.max-para and table.exec.max-para) if they are
>>> describing basically the same thing. But adding some more description
>>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2"
>>> does not hurt.
>>>
>>> 3. They should restore the original object given that the
>>> toConfiguration/fromConfiguration methods have been implemented
>>> correctly. I will extend the example to make the logic clearer while
>>> fixing the bug.
>>>
>>> Thanks for the healthy discussion,
>>> Timo
>>>
>>>
>>> On 30.08.19 15:29, Becket Qin wrote:
>>>> Hi Timo,
>>>>
>>>> Thanks again for the clarification. Please see a few more questions
>>> below.
>>>> Re: 1
>>>>
>>>>> Please also keep in mind that Configuration must not consist of only
>>>>> strings, it manages a Map<String, Object> for efficient access. Every
>>>>> map entry can have a string representation for persistence, but in
>>>>> most
>>>>> cases consists of unserialized objects.
>>>> I'd like to understand this a bit more. The reason we have a
>>>> Map<String,
>>>> Object> in Configuration was because that Object could be either a
>>> String,
>>>> a List or a Map, right? But they eventually always boil down to
>>>> Strings,
>>> or
>>>> maybe one of the predefined type that we know how to serialize. In the
>>>> current design, can the Object also be Configurable?
>>>> If the value in the config Map<String, Object> can be Configurable
>>> objects,
>>>> how do we serialize them? Calling toConfiguration() seems not quite
>>> working
>>>> because there might be some other internal fields that are not part of
>>> the
>>>> configuration. The modification to those fields will be lost if we
>>>> simply
>>>> use toConfiguration(). So the impact of modifying those Configurable
>>>> objects seems a little undefined. And it would be difficult to prevent
>>>> users from doing that.
>>>> If the value in the config Map<String, Object> cannot be Configurable
>>>> objects, then it seems a little weird to have all the other ConfigType
>>>> stored in the ConfigMap in their own native type and accessed via
>>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be
>>>> different from others because one have to use ConfigurableFactory
>>>> to get
>>>> the configurations.
>>>>
>>>> Re: 2
>>>>
>>>>> I think about the withExtendedDescription as a helper getter in a
>>>>> different place, so that the option is easier to find in a different
>>>>> module from it was defined.
>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be
>>>>> equal
>>> to:
>>>>> public ConfigOption getMaxParallelismOption() {
>>>>>       return CoreOptions.MAX_PARALLELISM;
>>>>> }
>>>> Just to make sure I understand it correctly, does that mean users will
>>> see
>>>> something like following?
>>>>    - CoreOptions.MAX_PARALLELISM with description_1;
>>>>    - TableOptions.MAX_PARALLELISM with description_1 + description_2.
>>>>    - DataStreamOptions.MAX_PARALLELISM with description_1 +
>>>> description_3.
>>>> And users will only configure exactly one MAX_PARALLELISM cross the
>>> board.
>>>> So they won't be confused by setting two MAX_PARALLELISM config for
>>>> two
>>>> different modules, while they are actually the same config. If that is
>>> the
>>>> case, I don't have further concern.
>>>>
>>>> Re: 3
>>>> Maybe I am thinking too much. I thought toBytes() / fromBytes()
>>>> actually
>>>> restore the original Object. But fromConfiguration() and
>>> toConfiguration()
>>>> does not do that, anything not in the configuration of the original
>>> object
>>>> will be lost. So it would be good to make that clear. Maybe a clear
>>>> Java
>>>> doc is sufficient.
>>>>
>>>> Thanks,
>>>>
>>>> Jiangjie (Becket) Qin
>>>>
>>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz
>>>> <dwysakow...@apache.org
>>>>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Ad. 1
>>>>>
>>>>> The advantage of our approach is that you have the type definition
>>>>> close
>>>>> to the option definition. The only difference is that it enables
>>>>> expressing simple pojos with the primitives like
>>>>> ConfigOption<Integer>,
>>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having
>>>>>
>>>>> the parsing logic scattered everywhere in the code base as it is now.
>>>>> The string representation in our proposal is exactly the same as you
>>>>> explained for those three options. The only difference is that you
>>>>> don't
>>>>> have to parse the elements of a List, Map etc. afterwards.
>>>>>
>>>>> Ad. 2
>>>>>
>>>>> I think about the withExtendedDescription as a helper getter in a
>>>>> different place, so that the option is easier to find in a different
>>>>> module from it was defined.
>>>>>
>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be
>>>>> equal
>>> to:
>>>>> public ConfigOption getMaxParallelismOption() {
>>>>>
>>>>>       return CoreOptions.MAX_PARALLELISM;
>>>>>
>>>>> }
>>>>>
>>>>> This allows to further clarify the description of the option in the
>>>>> context of a different module and end up in a seperate page in
>>>>> documentation (but with a link to the original one). In the end it is
>>>>> exactly the same option. It has exactly same key, type, parsing
>>>>> logic,
>>>>> it is in the end forwarded to the same place.
>>>>>
>>>>> Ad. 3
>>>>>
>>>>> Not sure if I understand your concerns here. As Timo said it is in
>>>>> the
>>>>> end sth similar to toBytes/fromBytes, but it puts itself to a
>>>>> Configuration. Also just wanted to make sure we adjusted this part
>>>>> slightly and now the ConfigOption takes ConfigurableFactory.
>>>>>
>>>>> Best,
>>>>>
>>>>> Dawid
>>>>>
>>>>>
>>>>> On 30/08/2019 09:39, Timo Walther wrote:
>>>>>> Hi Becket,
>>>>>>
>>>>>> thanks for the discussion.
>>>>>>
>>>>>> 1. ConfigOptions in their current design are bound to classes.
>>>>>> Regarding, the option is "creating some Configurable objects instead
>>>>>> of defining the config to create
>>>>>> those Configurable"? We just moved this logic to a factory, this
>>>>>> factory can then also be used for other purposes. However, how the
>>>>>> option and objects are serialized to Configuration is still not part
>>>>>> of the option. The option is just pure declaration.
>>>>>>
>>>>>> If we would allow only List<String>, implementers would need to
>>>>>> start
>>>>>> implementing own parsing and validation logic all the time. We would
>>>>>> like to avoid that.
>>>>>>
>>>>>> Please also keep in mind that Configuration must not consist of only
>>>>>> strings, it manages a Map<String, Object> for efficient access.
>>>>>> Every
>>>>>> map entry can have a string representation for persistence, but in
>>>>>> most cases consists of unserialized objects.
>>>>>>
>>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite
>>>>>> keys, types or default values. But different layers might want to
>>>>>> add
>>>>>> helpful information. In our concrete use case for FLIP-59,
>>>>>> ExecutionConfig has 50 properties and many of them are not relevant
>>>>>> for the Table layer or have no effect at all. We would like to list
>>>>>> and mention the most important config options again in the Table
>>>>>> Configuration section, so that users are not confused, but with a
>>>>>> strong link to the core option. E.g.: registered kryo serializers
>>>>>> are
>>>>>> also important also for Table users, we would like to add the
>>>>>> comment
>>>>>> "This option allows to modify the serialization of the ANY SQL data
>>>>>> type.". I think we should not spam the core configuration page with
>>>>>> comments from all layers, connectors, or libraries but keep this in
>>>>>> the corresponding component documentation.
>>>>>>
>>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes
>>>>>> and deserializes T from a configuration?
>>>>>>
>>>>>> Regards,
>>>>>> Timo
>>>>>>
>>>>>> On 29.08.19 19:14, Becket Qin wrote:
>>>>>>> Hi Timo and Stephan,
>>>>>>>
>>>>>>> Thanks for the detail explanation.
>>>>>>>
>>>>>>> 1. I agree that each config should be in a human readable
>>>>>>> format. My
>>>>>>> concern is that the current List<Configurable> looks going a little
>>>>>>> too far
>>>>>>> from what the configuration is supposed to do. They are essentially
>>>>>>> creating some Configurable objects instead of defining the
>>>>>>> config to
>>>>>>> create
>>>>>>> those Configurable. This mixes ConfigOption and the usage of it.
>>>>>>> API
>>>>>>> wise
>>>>>>> it would be good to keep the configs and their usages (such as
>>>>>>> how to
>>>>>>> create objects using the ConfigOption) apart from each other.
>>>>>>> I am wondering if we can just make List also only take string. For
>>>>>>> example,
>>>>>>> is the following definition of map and list sufficient?
>>>>>>>
>>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be
>>>>>>> defined
>>>>>>> as:
>>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ...
>>>>>>>
>>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined
>>> as:
>>>>>>> list_config_name: v1, v2, v3, ...
>>>>>>>
>>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String,
>>>>>>> String>>. It
>>>>>>> can
>>>>>>> be defined as:
>>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;....
>>>>>>>
>>>>>>> All the key and values in the configuration are String. This also
>>>>>>> guarantees that the configuration is always serializable.
>>>>>>> If we want to do one more step, we can allow the ConfigOption to
>>>>>>> set
>>> all
>>>>>>> the primitive types and parse that for the users. So something like
>>>>>>> List<Integer>, List<Class<?>> seems fine.
>>>>>>>
>>>>>>> The configuration class could also have util methods to create a
>>>>>>> list
>>> of
>>>>>>> configurable such as:
>>>>>>> <T> List<T>
>>> <Configuration#getConfigurableInstances(ListMapConfigOption,
>>>>>>> Class<T> clazz).
>>>>>>> But the configuration class will not take arbitrary Configurable as
>>> the
>>>>>>> value of its config.
>>>>>>>
>>>>>>> 2. I might have misunderstood this. But my concern on the
>>>>>>> description
>>>>>>> extension is in the following example.
>>>>>>>
>>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM =
>>>>>>>
>>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription(
>>>>>>> "Note: That this property means that a table program has a
>>>>>>> side-effect
>>>>>>> XYZ.");
>>>>>>>
>>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is
>>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I
>>>>>>> suppose
>>>>>>> users
>>>>>>> will see both configurations. One with an extended description
>>>>>>> and one
>>>>>>> without. Let's say there is a third component which also users
>>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM
>>>>>>> ConfigOption? If
>>>>>>> so, what would that ConfigOption's description look like?
>>>>>>>
>>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM
>>>>>>> and the
>>>>>>> description should clearly state all the usage of this
>>>>>>> ConfigOption.
>>>>>>>
>>>>>>> 3. I see, in that case, how about we name it something like
>>>>>>> extractConfiguration()? I am just trying to see if we can make it
>>> clear
>>>>>>> this is not something like fromBytes() and toBytes().
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jiangjie (Becket) Qin
>>>>>>>
>>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <twal...@apache.org>
>>>>> wrote:
>>>>>>>> Hi Becket,
>>>>>>>>
>>>>>>>> let me try to clarify some of your questions:
>>>>>>>>
>>>>>>>> 1. For every option, we also needed to think about how to
>>>>>>>> represent
>>> it
>>>>>>>> in a human readable format. We do not want to allow arbitrary
>>>>>>>> nesting
>>>>>>>> because that would easily allow to bypass the flattened
>>>>>>>> hierarchy of
>>>>>>>> config options (`session.memory.min`). The current design
>>>>>>>> allows to
>>>>>>>> represent every option type as a list. E.g.:
>>>>>>>>
>>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12`
>>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption:
>>>>>>>> field=12,other=true; field=12,other=true`
>>>>>>>> `myPropertyOption: key=str0,other=str1` can be
>>>>>>>> `myPropertyListOption:
>>>>>>>> key=str0,other=str1;key=str0,other=str1`
>>>>>>>>
>>>>>>>> We need the atomic class for serialization/deserialization both in
>>>>>>>> binary and string format.
>>>>>>>>
>>>>>>>> ConfigOption<List> is not present in the code base yet, but this
>>>>>>>> FLIP is
>>>>>>>> a preparation of making ExecutionConfig configurable. If you look
>>> into
>>>>>>>> this class or also in existing table connectors/formats, you
>>>>>>>> will see
>>>>>>>> that each proposed option type has its requirements.
>>>>>>>>
>>>>>>>> 2. Regarding extending the description of ConfigOptions, the
>>>>>>>> semantic of
>>>>>>>> one option should be a super set of the other option. E.g. in
>>>>>>>> Table
>>> API
>>>>>>>> we might use general ExecutionConfig properties. But we would like
>>>>>>>> to a)
>>>>>>>> make external options more prominent in the Table API config
>>>>>>>> docs to
>>>>>>>> link people to properties they should pay attention b) notice
>>>>>>>> about
>>>>>>>> side
>>>>>>>> effects. The core semantic of a property should not change.
>>>>>>>>
>>>>>>>> 3. The factory will not receive the entire configuration but works
>>> in a
>>>>>>>> separate key space. For `myObjectOption` above, it would receive a
>>>>>>>> configuration that consists of `field: 12` and `other: true`.
>>>>>>>>
>>>>>>>> I agree. I will convert the document into a Wiki page today.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Timo
>>>>>>>>
>>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote:
>>>>>>>>> @Becket One thing that may be non-obvious is that the
>>>>>>>>> Configuration
>>>>>>>>> class
>>>>>>>>> also defines serialization / persistence logic at the moment.
>>>>>>>>> So it
>>>>>>>>> needs
>>>>>>>>> to know the set of types it supports. That stands in the way
>>>>>>>>> of an
>>>>>>>>> arbitrary generic map type.
>>>>>>>>>
>>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one
>>>>>>>>> collection orthogonal to the type (List) and another one bound to
>>>>>>>> specific
>>>>>>>>> types (Map).
>>>>>>>>>
>>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <becket....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Timo,
>>>>>>>>>>
>>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I
>>>>>>>>>> have a
>>>>>>>>>> few
>>>>>>>>>> questions / comments.
>>>>>>>>>>
>>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption?
>>>>>>>>>> Would it be enough to just check the atomicClass to see if it
>>>>>>>>>> is a
>>>>>>>>>> List
>>>>>>>> or
>>>>>>>>>> not?
>>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume
>>>>>>>>>> both key
>>>>>>>>>> and value types are String? Can we just apply the same to the
>>>>>>>>>> ConfigOption<List>?
>>>>>>>>>> BTW, I did a quick search in the codebase but did not find any
>>>>>>>>>> usage of
>>>>>>>>>> ConfigOption<List>.
>>>>>>>>>>
>>>>>>>>>> 2. The same config name, but with two ConfigOption with
>>>>>>>>>> different
>>>>>>>> semantic
>>>>>>>>>> in different component seems super confusing. For example, when
>>> users
>>>>>>>> set
>>>>>>>>>> both configs, they may have no idea one is overriding the other.
>>>>>>>>>> There
>>>>>>>>>> might be two cases:
>>>>>>>>>>      - If it is just the same config used by different
>>>>>>>>>> components to
>>>>>>>>>> act
>>>>>>>>>> accordingly, it might be better to just have one config, but
>>> describe
>>>>>>>>>> clearly on how that config will be used.
>>>>>>>>>>      - If it is actually two configurations that can be set
>>>>>>>>>> differently, I
>>>>>>>>>> think the config names should just be different.
>>>>>>>>>>
>>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration()
>>> method
>>>>>>>>>> pretty much means getConfiguration()? The toConfiguration()
>>>>>>>>>> method
>>>>>>>> sounds
>>>>>>>>>> like converting an object to a configuration, which only
>>>>>>>>>> works if
>>> the
>>>>>>>>>> object does not contain any state / value. I am also
>>>>>>>>>> wondering if
>>>>>>>>>> there
>>>>>>>> is
>>>>>>>>>> a real use case of this method. Because supposedly the
>>> configurations
>>>>>>>> could
>>>>>>>>>> just be passed around to caller of this method.
>>>>>>>>>>
>>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of
>>>>>>>>>> in the
>>>>>>>> Google
>>>>>>>>>> doc before voting? The FLIP wiki allows track the modification
>>>>>>>>>> history
>>>>>>>> and
>>>>>>>>>> has a more established structure to ensure nothing is missed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther
>>>>>>>>>> <twal...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the
>>> voting
>>>>>>>>>>> thread. If there are no objections, I will start a new voting
>>> thread
>>>>>>>>>>> tomorrow at 9am Berlin time.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote:
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks for all the feedback we have received online and
>>>>>>>>>>>> offline.
>>> It
>>>>>>>>>>>> showed that many people support the idea of evolving the Flink
>>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP
>>>>>>>>>>>> will
>>>>>>>>>>>> not
>>>>>>>>>>>> solve all issues but at least will improve the current status.
>>>>>>>>>>>>
>>>>>>>>>>>> We've updated the document and replaced the Correlation part
>>>>>>>>>>>> with the
>>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available
>>>>>>>>>>>> options
>>>>>>>>>>>> of a group plus custom group validators for eager validation.
>>>>>>>>>>>> For now,
>>>>>>>>>>>> this eager group validation will only be used at certain
>>>>>>>>>>>> locations in
>>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire
>>>>>>>>>>>> global
>>>>>>>>>>>> configuration before submitting a job in the future.
>>>>>>>>>>>>
>>>>>>>>>>>> Please take another look if you find time. I hope we can
>>>>>>>>>>>> proceed
>>>>>>>>>>>> with
>>>>>>>>>>>> the voting process if there are no objections.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther:
>>>>>>>>>>>>> Hi Stephan,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for your suggestions. Let me give you some background
>>> about
>>>>>>>>>>>>> the decisions made in this FLIP:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework"
>>>>>>>>>>>>> because we
>>> did
>>>>>>>>>>>>> not want to change the entire configuration infrastructure.
>>>>>>>>>>>>> Both for
>>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that
>>>>>>>>>>>>> would be
>>>>>>>>>>>>> required to update all options. If our goal is to rework the
>>>>>>>>>>>>> configuration option entirely, I might suggest to switch
>>>>>>>>>>>>> to JSON
>>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting
>>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the
>>> more
>>>>>>>>>>>>> nested structures are allowed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is
>>>>>>>>>>>>> centered
>>>>>>>>>>>>> around Java classes where T is determined by the default
>>>>>>>>>>>>> value.
>>>>>>>>>>>>> The
>>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit
>>>>>>>>>>>>> `intType()` method etc. The current design of validators
>>> centered
>>>>>>>>>>>>> around Java classes makes it possible to have typical domain
>>>>>>>>>>>>> validators baked by generics as you suggested. If we
>>>>>>>>>>>>> introduce
>>>>>>>>>>>>> types
>>>>>>>>>>>>> such as "quantity with measure and unit" we still need to
>>>>>>>>>>>>> get a
>>>>>>>>>>>>> class
>>>>>>>>>>>>> out of this option at the end, so why changing a proven
>>>>>>>>>>>>> concept?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary
>>>>>>>>>>>>> nesting. As
>>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every
>>>>>>>>>>>>> atomic
>>>>>>>>>>>>> option like "key=12" can be represented by a list
>>>>>>>>>>>>> "keys=12;13".
>>>>>>>>>>>>> But
>>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated
>>>>>>>>>>>>> list
>>>>>>>>>>>>> option
>>>>>>>>>>>>> would start making this more complicated such as
>>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...),
>>>>>>>>>>>>> StringOption(...)))", do we want that?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4. Correlation: The correlation part is one of the
>>>>>>>>>>>>> suggestions
>>>>>>>>>>>>> that I
>>>>>>>>>>>>> like least in the document. We can also discuss removing it
>>>>>>>>>>>>> entirely,
>>>>>>>>>>>>> but I think it solves the use case of relating options
>>>>>>>>>>>>> with each
>>>>>>>>>>>>> other in a flexible way right next to the actual option.
>>>>>>>>>>>>> Instead of
>>>>>>>>>>>>> being hidden in some component initialization, we should
>>>>>>>>>>>>> put it
>>>>>>>>>>>>> close
>>>>>>>>>>>>> to the option to also perform validation eagerly instead of
>>>>>>>>>>>>> failing
>>>>>>>>>>>>> at runtime when the option is accessed the first time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen:
>>>>>>>>>>>>>> A "List Type" sounds like a good direction to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The
>>>>>>>>>>>>>> idea is
>>>>>>>>>>>>>> to see
>>>>>>>>>>>>>> if something like that can ease validation. Especially the
>>>>>>>>>> correlation
>>>>>>>>>>>>>> system seems quite complex (proxies to work around order of
>>>>>>>>>>>>>> initialization).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For example, let's assume we don't think primarily about
>>>>>>>>>>>>>> "java
>>>>>>>>>>>>>> types" but
>>>>>>>>>>>>>> would define types as one of the following (just examples,
>>>>>>>>>>>>>> haven't
>>>>>>>>>>>>>> thought
>>>>>>>>>>>>>> all the details through):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        (a) category type: implies string, and a fix set of
>>> possible
>>>>>>>>>> values.
>>>>>>>>>>>>>> Those would be passes and naturally make it into the docs
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> validation.
>>>>>>>>>>>>>> Maps to a String or Enum in Java.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        (b) numeric integer type: implies long (or optionally
>>>>>>>>>>>>>> integer,
>>>>>>>> if
>>>>>>>>>>>>>> we want
>>>>>>>>>>>>>> to automatically check overflow / underflow). would take
>>> typical
>>>>>>>>>> domain
>>>>>>>>>>>>>> validators, like non-negative, etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        (c) numeric real type: same as above (double or
>>>>>>>>>>>>>> float)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        (d) numeric interval type: either defined as an
>>> interval, or
>>>>>>>>>>>>>> references
>>>>>>>>>>>>>> other parameter by key. validation by valid interval.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        (e) quantity: a measure and a unit. separately
>>>>>>>>>>>>>> parsable.
>>> The
>>>>>>>>>>>>>> measure's
>>>>>>>>>>>>>> type could be any of the numeric types above, with same
>>>>>>>>>>>>>> validation
>>>>>>>>>>>>>> rules.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With a system like the above, would we still correlation
>>>>>>>>>>>>>> validators?
>>>>>>>>>>>>>> Are
>>>>>>>>>>>>>> there still cases that we need to catch early (config
>>>>>>>>>>>>>> loading)
>>> or
>>>>>>>>>>>>>> are the
>>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup
>>> specific,
>>>>>>>>>>>>>> that it is
>>>>>>>>>>>>>> fine to handle them in component initialization?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz
>>>>>>>>>>>>>> <dwysakow...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Stephan,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for your opinion.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the
>>>>>>>>>>>>>>> most of
>>>>>>>>>> the
>>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown
>>> type
>>>>>>>>>>>>>>> system,
>>>>>>>>>>>>>>> a field like isList may look weird. Please let me
>>>>>>>>>>>>>>> elaborate a
>>>>>>>>>>>>>>> bit
>>>>>>>>>> more
>>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear
>>>>>>>>>>>>>>> enough
>>>>>>>>>>>>>>> about
>>>>>>>>>> it
>>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is
>>>>>>>>>>>>>>> that they
>>>>>>>>>>>>>>> must
>>>>>>>>>>>>>>> have a string representation as they might come from a
>>>>>>>> configuration
>>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so
>>>>>>>>>>>>>>> that the
>>>>>>>>>> values
>>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not
>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>> add a
>>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for
>>>>>>>>>>>>>>> lists
>>>>>>>>>> only
>>>>>>>>>>>>>>> (and flat objects - I think though in the current design
>>>>>>>>>>>>>>> there is a
>>>>>>>>>>>>>>> mistake around the Configurable interface). I think though
>>>>>>>>>>>>>>> you have
>>>>>>>>>> a
>>>>>>>>>>>>>>> point here and it would be better to have a
>>>>>>>>>>>>>>> ListConfigOption
>>>>>>>>>>>>>>> instead of
>>>>>>>>>>>>>>> this field. Does it make sense to you?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I
>>>>>>>>>> understood
>>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from
>>>>>>>>>>>>>>> Configuration that means they can be bound to the generic
>>>>>>>>>>>>>>> parameter
>>>>>>>>>> of
>>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends
>>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the
>>>>>>>>>>>>>>> ConfigOption
>>>>>>>>>>>>>>> has anything to do with the validation logic. Could you
>>>>>>>>>>>>>>> elaborate a
>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>> more what did you mean?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote:
>>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do
>>> early
>>>>>>>>>>>>>>> validation.
>>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad
>>>>>>>>>>>>>>>> hoc,
>>>>>>>>>>>>>>>> though. For
>>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear
>>>>>>>>>>>>>>>> indication of
>>> not
>>>>>>>>>>>>>>>> having
>>>>>>>>>>>>>>>> thought through the type/category system.
>>>>>>>>>>>>>>>> Also, having a more clear category system makes validation
>>>>>>>> simpler.
>>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between
>>> numeric
>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible
>>>>>>>>>>>>>>>> values),
>>>>>>>>>>>>>>>> quantities
>>>>>>>>>>>>>>>> like duration and memory size (need measure and unit),
>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>> results in
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>> elegant system for validation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <
>>>>>>>>>> lzljs3620...@aliyun.com
>>>>>>>>>>>>>>> .invalid>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design.
>>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of
>>>>>>>>>>>>>>>>> various
>>>>>>>>>>>>>>>>>       modules to be unified.
>>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new
>>>>>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>> TableEnvironment available for production.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations,
>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The
>>>>>>>>>>>>>>>>> skew may
>>>>>>>>>>>>>>>>>       be a single field or a combination of multiple
>>>>>>>>>>>>>>>>> fields.
>>>>>>>>>>>>>>>>> So the
>>>>>>>>>>>>>>>>>       configuration is very troublesome. We used JSON
>>>>>>>>>>>>>>>>> string
>>> to
>>>>>>>>>>>>>>>>>       configure it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Jingsong Lee
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>>>>>>>>> From:Jark Wu <imj...@gmail.com>
>>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44
>>>>>>>>>>>>>>>>> To:dev <dev@flink.apache.org>
>>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and
>>>>>>>>>> Configuration
>>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a
>>> long
>>>>>>>>>> time.
>>>>>>>>>>>>>>>>> We have seen the benefit when developing blink
>>>>>>>>>>>>>>>>> configurations and
>>>>>>>>>>>>>>> connector
>>>>>>>>>>>>>>>>> properties in 1.9 release.
>>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed
>>>>>>>>>>>>>>>>> design.
>>>>>>>>>>>>>>>>> I will leave my thoughts and comments there.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <
>>> wander4...@gmail.com
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which
>>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of
>>>>>>>>>>>>>>>>>> Flink.
>>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment
>>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by
>>>>>>>>>> configuration.
>>>>>>>>>>>>>>>>>> Will take a look at the document in days.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> tison.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2019年8月16日周五
>>>>>>>>>>>>>>>>>> 下午10:12写道：
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of
>>>>>>>>>>>>>>>>>>> ExecutionConfig and
>>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is
>>>>>>>>>>>>>>>>>>> necessary
>>>>>>>>>>>>>>>>>>> to make
>>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally,
>>>>>>>>>>>>>>>>>>> with
>>> the
>>>>>>>>>>>>>>>>>>> new SQL
>>>>>>>>>>>>>>>>> DDL
>>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and
>>>>>>>>>>>>>>>>>>> formats
>>>>>>>>>>>>>>>>>>> coming up,
>>>>>>>>>>>>>>>>>>> unified configuration becomes more important.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We need more features around string-based configuration
>>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>> future,
>>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose
>>>>>>>>>>>>>>>>>>> FLIP-54 for
>>>>>>>>>>>>>>>>>>> evolving
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit
>>>
>>>>>>>>>>>>>>>>>>> In summary it adds:
>>>>>>>>>>>>>>>>>>> - documented types and validation
>>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list
>>>>>>>>>>>>>>>>>>> - simple non-nested object types
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Looking forward to your feedback,
>>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

Reply via email to