Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

Becket Qin Mon, 02 Sep 2019 05:40:45 -0700

Hi Timo and Dawid,

Thanks for the patient reply. I agree that both option a) and option b) can
solve the mutability problem.


For option a), is it a little intrusive to add a duplicate() method for a
Configurable? It would be great if we don't put this burden on users if
possible.

For option b), I actually feel it is slightly better than option a) from
API perspective as getFactory() seems a more understandable method of a
Configurable compared with duplicate(). And users do not need to implement
much more logic.

I am curious what is the downside of keeping the Configuration simple to
only have primitive types, and always create the Configurable using a util
method? If Configurables are rare, do we need to put the instantiation /
bookkeeping logic in Configuration?

 @Becket for the toConfiguration this is required for shipping the
> Configuration to TaskManager, so that we do not have to use java
> serializability.

Do you mean a Configurable may be created and configured directly without
reading settings from a Configuration instance? I thought a Configurable
will always be created via a ConfigurableFactory by extracting required
configs from a Configuration. Am I missing something?

Thanks,

Jiangjie (Becket) Qin

On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <[email protected]>
wrote:

> Hi Timo, Becket
>
> From the options that Timo suggested for improving the mutability
> situation I would prefer option a) as this is the more explicit option
> and simpler option. Just as a remark, I think in general Configurable
> types for options will be rather very rare for some special use-cases,
> as the majority of options are rather simpler parameters of primitive
> types (+duration, memory)
>
> @Becket for the toConfiguration this is required for shipping the
> Configuration to TaskManager, so that we do not have to use java
> serializability.
>
> Best,
>
> Dawid
>
>
> On 02/09/2019 10:05, Timo Walther wrote:
> > Hi Becket,
> >
> > Re 1 & 3: "values in configurations should actually be immutable"
> >
> > I would also prefer immutability but most of our configuration is
> > mutable due to serialization/deserialization. Also maps and list could
> > be mutable in theory. It is difficult to really enforce that for
> > nested structures. I see two options:
> >
> > a) For the original design: How about we force implementers to add a
> > duplicate() method in a Configurable object? This would mean that the
> > object is still mutable but by duplicating the object both during
> > reading and writing we would avoid the problem you described.
> >
> > b) For the current design: We still use the factory approach but let a
> > Configurable object implement a getFactory() method such that we know
> > how to serialize the object. With the help of a factory we can also
> > duplicate the object easily during reading and writing and ensure
> > immutability.
> >
> > I would personally go for approach a) to not over-engineer this topic.
> > But I'm open for option b).
> >
> > Regards,
> > Timo
> >
> >
> > On 31.08.19 04:09, Becket Qin wrote:
> >> Hi Timo,
> >>
> >> Thanks for the reply. I am still a little concerned over the
> >> mutability of
> >> the Configurable which could be the value in Configuration.
> >>
> >> Re: 1
> >>
> >>> But in general, people should not use any internal fields.
> >>> Configurable objects are meant for simple little helper POJOs, not
> >>> complex arbitrary nested data structures.
> >> This seems difficult to enforce... Ideally the values in configurations
> >> should actually be immutable. The value can only be changed by
> >> explicitly
> >> calling setters in Configuration. Otherwise we may have weird situation
> >> where the Configurable in the same configuration are different in two
> >> places because the configurable is modified in one places and not
> >> modified
> >> in another place. So I am a little concerned on putting a
> >> Configurable type
> >> in the Configuration map, because the value could be modified without
> >> explicitly setting the configuration. For example, can users do the
> >> following?
> >>
> >> Configurable configurable =
> >> configuration.getConfigurable(myConfigurableOption);
> >> configurable.setConfigA(123); // this already changes the configurable
> >> object in the configuration.
> >>
> >> Re: 2
> >> Thanks for confirming. As long as users will not have a situation where
> >> they need to set two configurations with the same key but different
> >> descriptions, I think it is OK.
> >>
> >> Re: 3
> >> This is actually kind of related to 1. i.e. Whether toConfiguration()
> >> guarantees the exact same object can be rebuilt from the
> >> configuration or
> >> not. I am still not quite sure about the use case of toConfiguration()
> >> though. It seems indicating the Configurable is mutable, which might be
> >> dangerous.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Fri, Aug 30, 2019 at 10:04 PM Timo Walther <[email protected]>
> >> wrote:
> >>
> >>> Hi Becket,
> >>>
> >>> 1. First of all, you are totally right. The FLIP contains a bug due to
> >>> the last minute changes that Dawid suggested: by having immutable
> >>> objects created by a factory we loose the serializability of the
> >>> Configuration because the factory itself is not stored in the
> >>> Configuration. I would propose to revert the last change and stick to
> >>> the original design, which means that a object must implement the
> >>> Configurable interface and also implements
> >>> serialization/deserialization
> >>> methods such that also internal fields can be persisted as you
> >>> suggested. But in general, people should not use any internal fields.
> >>> Configurable objects are meant for simple little helper POJOs, not
> >>> complex arbitrary nested data structures.
> >>>
> >>> It is Map<String, Object> because Configuration stores the raw objects.
> >>> If you put a Boolean option into it, it remains Boolean. This makes the
> >>> map very efficient for shipping to the cluster and accessing options
> >>> multiple times. The same for configurable objects. We put the pure
> >>> objects into the map without any serialization/deserialization. The
> >>> provided factory allows to convert the Object into a Configuration and
> >>> we know how to serialize/deserializise a configuration because it is
> >>> just a key/value map.
> >>>
> >>> 2. Yes, this is what we had in mind. It should still be the same
> >>> configuration option. We would like to avoid specialized option keys
> >>> across components (exec.max-para and table.exec.max-para) if they are
> >>> describing basically the same thing. But adding some more description
> >>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2"
> >>> does not hurt.
> >>>
> >>> 3. They should restore the original object given that the
> >>> toConfiguration/fromConfiguration methods have been implemented
> >>> correctly. I will extend the example to make the logic clearer while
> >>> fixing the bug.
> >>>
> >>> Thanks for the healthy discussion,
> >>> Timo
> >>>
> >>>
> >>> On 30.08.19 15:29, Becket Qin wrote:
> >>>> Hi Timo,
> >>>>
> >>>> Thanks again for the clarification. Please see a few more questions
> >>> below.
> >>>> Re: 1
> >>>>
> >>>>> Please also keep in mind that Configuration must not consist of only
> >>>>> strings, it manages a Map<String, Object> for efficient access. Every
> >>>>> map entry can have a string representation for persistence, but in
> >>>>> most
> >>>>> cases consists of unserialized objects.
> >>>> I'd like to understand this a bit more. The reason we have a
> >>>> Map<String,
> >>>> Object> in Configuration was because that Object could be either a
> >>> String,
> >>>> a List or a Map, right? But they eventually always boil down to
> >>>> Strings,
> >>> or
> >>>> maybe one of the predefined type that we know how to serialize. In the
> >>>> current design, can the Object also be Configurable?
> >>>> If the value in the config Map<String, Object> can be Configurable
> >>> objects,
> >>>> how do we serialize them? Calling toConfiguration() seems not quite
> >>> working
> >>>> because there might be some other internal fields that are not part of
> >>> the
> >>>> configuration. The modification to those fields will be lost if we
> >>>> simply
> >>>> use toConfiguration(). So the impact of modifying those Configurable
> >>>> objects seems a little undefined. And it would be difficult to prevent
> >>>> users from doing that.
> >>>> If the value in the config Map<String, Object> cannot be Configurable
> >>>> objects, then it seems a little weird to have all the other ConfigType
> >>>> stored in the ConfigMap in their own native type and accessed via
> >>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be
> >>>> different from others because one have to use ConfigurableFactory
> >>>> to get
> >>>> the configurations.
> >>>>
> >>>> Re: 2
> >>>>
> >>>>> I think about the withExtendedDescription as a helper getter in a
> >>>>> different place, so that the option is easier to find in a different
> >>>>> module from it was defined.
> >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be
> >>>>> equal
> >>> to:
> >>>>> public ConfigOption getMaxParallelismOption() {
> >>>>>       return CoreOptions.MAX_PARALLELISM;
> >>>>> }
> >>>> Just to make sure I understand it correctly, does that mean users will
> >>> see
> >>>> something like following?
> >>>>    - CoreOptions.MAX_PARALLELISM with description_1;
> >>>>    - TableOptions.MAX_PARALLELISM with description_1 + description_2.
> >>>>    - DataStreamOptions.MAX_PARALLELISM with description_1 +
> >>>> description_3.
> >>>> And users will only configure exactly one MAX_PARALLELISM cross the
> >>> board.
> >>>> So they won't be confused by setting two MAX_PARALLELISM config for
> >>>> two
> >>>> different modules, while they are actually the same config. If that is
> >>> the
> >>>> case, I don't have further concern.
> >>>>
> >>>> Re: 3
> >>>> Maybe I am thinking too much. I thought toBytes() / fromBytes()
> >>>> actually
> >>>> restore the original Object. But fromConfiguration() and
> >>> toConfiguration()
> >>>> does not do that, anything not in the configuration of the original
> >>> object
> >>>> will be lost. So it would be good to make that clear. Maybe a clear
> >>>> Java
> >>>> doc is sufficient.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jiangjie (Becket) Qin
> >>>>
> >>>> On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz
> >>>> <[email protected]
> >>>>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Ad. 1
> >>>>>
> >>>>> The advantage of our approach is that you have the type definition
> >>>>> close
> >>>>> to the option definition. The only difference is that it enables
> >>>>> expressing simple pojos with the primitives like
> >>>>> ConfigOption<Integer>,
> >>>>> ConfigOption<Long> etc. Otherwise as Timo said you will start having
> >>>>>
> >>>>> the parsing logic scattered everywhere in the code base as it is now.
> >>>>> The string representation in our proposal is exactly the same as you
> >>>>> explained for those three options. The only difference is that you
> >>>>> don't
> >>>>> have to parse the elements of a List, Map etc. afterwards.
> >>>>>
> >>>>> Ad. 2
> >>>>>
> >>>>> I think about the withExtendedDescription as a helper getter in a
> >>>>> different place, so that the option is easier to find in a different
> >>>>> module from it was defined.
> >>>>>
> >>>>> The MAX_PARALLELISM option in TableOptions would conceptually be
> >>>>> equal
> >>> to:
> >>>>> public ConfigOption getMaxParallelismOption() {
> >>>>>
> >>>>>       return CoreOptions.MAX_PARALLELISM;
> >>>>>
> >>>>> }
> >>>>>
> >>>>> This allows to further clarify the description of the option in the
> >>>>> context of a different module and end up in a seperate page in
> >>>>> documentation (but with a link to the original one). In the end it is
> >>>>> exactly the same option. It has exactly same key, type, parsing
> >>>>> logic,
> >>>>> it is in the end forwarded to the same place.
> >>>>>
> >>>>> Ad. 3
> >>>>>
> >>>>> Not sure if I understand your concerns here. As Timo said it is in
> >>>>> the
> >>>>> end sth similar to toBytes/fromBytes, but it puts itself to a
> >>>>> Configuration. Also just wanted to make sure we adjusted this part
> >>>>> slightly and now the ConfigOption takes ConfigurableFactory.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Dawid
> >>>>>
> >>>>>
> >>>>> On 30/08/2019 09:39, Timo Walther wrote:
> >>>>>> Hi Becket,
> >>>>>>
> >>>>>> thanks for the discussion.
> >>>>>>
> >>>>>> 1. ConfigOptions in their current design are bound to classes.
> >>>>>> Regarding, the option is "creating some Configurable objects instead
> >>>>>> of defining the config to create
> >>>>>> those Configurable"? We just moved this logic to a factory, this
> >>>>>> factory can then also be used for other purposes. However, how the
> >>>>>> option and objects are serialized to Configuration is still not part
> >>>>>> of the option. The option is just pure declaration.
> >>>>>>
> >>>>>> If we would allow only List<String>, implementers would need to
> >>>>>> start
> >>>>>> implementing own parsing and validation logic all the time. We would
> >>>>>> like to avoid that.
> >>>>>>
> >>>>>> Please also keep in mind that Configuration must not consist of only
> >>>>>> strings, it manages a Map<String, Object> for efficient access.
> >>>>>> Every
> >>>>>> map entry can have a string representation for persistence, but in
> >>>>>> most cases consists of unserialized objects.
> >>>>>>
> >>>>>> 2. MAX_PARALLELISM is still defined just once. We don't overwrite
> >>>>>> keys, types or default values. But different layers might want to
> >>>>>> add
> >>>>>> helpful information. In our concrete use case for FLIP-59,
> >>>>>> ExecutionConfig has 50 properties and many of them are not relevant
> >>>>>> for the Table layer or have no effect at all. We would like to list
> >>>>>> and mention the most important config options again in the Table
> >>>>>> Configuration section, so that users are not confused, but with a
> >>>>>> strong link to the core option. E.g.: registered kryo serializers
> >>>>>> are
> >>>>>> also important also for Table users, we would like to add the
> >>>>>> comment
> >>>>>> "This option allows to modify the serialization of the ANY SQL data
> >>>>>> type.". I think we should not spam the core configuration page with
> >>>>>> comments from all layers, connectors, or libraries but keep this in
> >>>>>> the corresponding component documentation.
> >>>>>>
> >>>>>> 3. But it is something like fromBytes() and toBytes()? It serializes
> >>>>>> and deserializes T from a configuration?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>> On 29.08.19 19:14, Becket Qin wrote:
> >>>>>>> Hi Timo and Stephan,
> >>>>>>>
> >>>>>>> Thanks for the detail explanation.
> >>>>>>>
> >>>>>>> 1. I agree that each config should be in a human readable
> >>>>>>> format. My
> >>>>>>> concern is that the current List<Configurable> looks going a little
> >>>>>>> too far
> >>>>>>> from what the configuration is supposed to do. They are essentially
> >>>>>>> creating some Configurable objects instead of defining the
> >>>>>>> config to
> >>>>>>> create
> >>>>>>> those Configurable. This mixes ConfigOption and the usage of it.
> >>>>>>> API
> >>>>>>> wise
> >>>>>>> it would be good to keep the configs and their usages (such as
> >>>>>>> how to
> >>>>>>> create objects using the ConfigOption) apart from each other.
> >>>>>>> I am wondering if we can just make List also only take string. For
> >>>>>>> example,
> >>>>>>> is the following definition of map and list sufficient?
> >>>>>>>
> >>>>>>> A MapConfigOption is ConfigOption<Map<String, String>>. It can be
> >>>>>>> defined
> >>>>>>> as:
> >>>>>>> map_config_name: k1=v1, k2=v2, k3=v3, ...
> >>>>>>>
> >>>>>>> A ListConfigOption is ConfigOption<List<String>>. It can be defined
> >>> as:
> >>>>>>> list_config_name: v1, v2, v3, ...
> >>>>>>>
> >>>>>>> A ListOfMapConfigOption is ConfigOption<List<Map<String,
> >>>>>>> String>>. It
> >>>>>>> can
> >>>>>>> be defined as:
> >>>>>>> list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;....
> >>>>>>>
> >>>>>>> All the key and values in the configuration are String. This also
> >>>>>>> guarantees that the configuration is always serializable.
> >>>>>>> If we want to do one more step, we can allow the ConfigOption to
> >>>>>>> set
> >>> all
> >>>>>>> the primitive types and parse that for the users. So something like
> >>>>>>> List<Integer>, List<Class<?>> seems fine.
> >>>>>>>
> >>>>>>> The configuration class could also have util methods to create a
> >>>>>>> list
> >>> of
> >>>>>>> configurable such as:
> >>>>>>> <T> List<T>
> >>> <Configuration#getConfigurableInstances(ListMapConfigOption,
> >>>>>>> Class<T> clazz).
> >>>>>>> But the configuration class will not take arbitrary Configurable as
> >>> the
> >>>>>>> value of its config.
> >>>>>>>
> >>>>>>> 2. I might have misunderstood this. But my concern on the
> >>>>>>> description
> >>>>>>> extension is in the following example.
> >>>>>>>
> >>>>>>> public static final ConfigOption<Integer> MAX_PARALLELISM =
> >>>>>>>
> >>>>>>> CoreOptions.MAX_PARALLELISM.withExtendedDescription(
> >>>>>>> "Note: That this property means that a table program has a
> >>>>>>> side-effect
> >>>>>>> XYZ.");
> >>>>>>>
> >>>>>>> In this case, we will have two MAX_PARALLELISM configs now. One is
> >>>>>>> CoreOptions.MAX_PARALLELISM. The other one is defined here. I
> >>>>>>> suppose
> >>>>>>> users
> >>>>>>> will see both configurations. One with an extended description
> >>>>>>> and one
> >>>>>>> without. Let's say there is a third component which also users
> >>>>>>> MAX_PARALLELISM, will there be yet another MAX_PARALLELISM
> >>>>>>> ConfigOption? If
> >>>>>>> so, what would that ConfigOption's description look like?
> >>>>>>>
> >>>>>>> Ideally, we would want to have just one CoreOptions.MAX_PARALLELISM
> >>>>>>> and the
> >>>>>>> description should clearly state all the usage of this
> >>>>>>> ConfigOption.
> >>>>>>>
> >>>>>>> 3. I see, in that case, how about we name it something like
> >>>>>>> extractConfiguration()? I am just trying to see if we can make it
> >>> clear
> >>>>>>> this is not something like fromBytes() and toBytes().
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Jiangjie (Becket) Qin
> >>>>>>>
> >>>>>>> On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <[email protected]>
> >>>>> wrote:
> >>>>>>>> Hi Becket,
> >>>>>>>>
> >>>>>>>> let me try to clarify some of your questions:
> >>>>>>>>
> >>>>>>>> 1. For every option, we also needed to think about how to
> >>>>>>>> represent
> >>> it
> >>>>>>>> in a human readable format. We do not want to allow arbitrary
> >>>>>>>> nesting
> >>>>>>>> because that would easily allow to bypass the flattened
> >>>>>>>> hierarchy of
> >>>>>>>> config options (`session.memory.min`). The current design
> >>>>>>>> allows to
> >>>>>>>> represent every option type as a list. E.g.:
> >>>>>>>>
> >>>>>>>> `myIntOption: 12` can be `myIntListOption: 12;12`
> >>>>>>>> `myObjectOption: field=12,other=true` can be `myObjectListOption:
> >>>>>>>> field=12,other=true; field=12,other=true`
> >>>>>>>> `myPropertyOption: key=str0,other=str1` can be
> >>>>>>>> `myPropertyListOption:
> >>>>>>>> key=str0,other=str1;key=str0,other=str1`
> >>>>>>>>
> >>>>>>>> We need the atomic class for serialization/deserialization both in
> >>>>>>>> binary and string format.
> >>>>>>>>
> >>>>>>>> ConfigOption<List> is not present in the code base yet, but this
> >>>>>>>> FLIP is
> >>>>>>>> a preparation of making ExecutionConfig configurable. If you look
> >>> into
> >>>>>>>> this class or also in existing table connectors/formats, you
> >>>>>>>> will see
> >>>>>>>> that each proposed option type has its requirements.
> >>>>>>>>
> >>>>>>>> 2. Regarding extending the description of ConfigOptions, the
> >>>>>>>> semantic of
> >>>>>>>> one option should be a super set of the other option. E.g. in
> >>>>>>>> Table
> >>> API
> >>>>>>>> we might use general ExecutionConfig properties. But we would like
> >>>>>>>> to a)
> >>>>>>>> make external options more prominent in the Table API config
> >>>>>>>> docs to
> >>>>>>>> link people to properties they should pay attention b) notice
> >>>>>>>> about
> >>>>>>>> side
> >>>>>>>> effects. The core semantic of a property should not change.
> >>>>>>>>
> >>>>>>>> 3. The factory will not receive the entire configuration but works
> >>> in a
> >>>>>>>> separate key space. For `myObjectOption` above, it would receive a
> >>>>>>>> configuration that consists of `field: 12` and `other: true`.
> >>>>>>>>
> >>>>>>>> I agree. I will convert the document into a Wiki page today.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Timo
> >>>>>>>>
> >>>>>>>> On 29.08.19 09:00, Stephan Ewen wrote:
> >>>>>>>>> @Becket One thing that may be non-obvious is that the
> >>>>>>>>> Configuration
> >>>>>>>>> class
> >>>>>>>>> also defines serialization / persistence logic at the moment.
> >>>>>>>>> So it
> >>>>>>>>> needs
> >>>>>>>>> to know the set of types it supports. That stands in the way
> >>>>>>>>> of an
> >>>>>>>>> arbitrary generic map type.
> >>>>>>>>>
> >>>>>>>>> @Timo I agree though that it seems a bit inconsistent to have one
> >>>>>>>>> collection orthogonal to the type (List) and another one bound to
> >>>>>>>> specific
> >>>>>>>>> types (Map).
> >>>>>>>>>
> >>>>>>>>> On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <[email protected]
> >
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Timo,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for the proposal. Sorry for the late comments, but I
> >>>>>>>>>> have a
> >>>>>>>>>> few
> >>>>>>>>>> questions / comments.
> >>>>>>>>>>
> >>>>>>>>>> 1. Is a new field of isList necessary in the ConfigOption?
> >>>>>>>>>> Would it be enough to just check the atomicClass to see if it
> >>>>>>>>>> is a
> >>>>>>>>>> List
> >>>>>>>> or
> >>>>>>>>>> not?
> >>>>>>>>>> Also, in the ConfigOption<Map> class case, are we always assume
> >>>>>>>>>> both key
> >>>>>>>>>> and value types are String? Can we just apply the same to the
> >>>>>>>>>> ConfigOption<List>?
> >>>>>>>>>> BTW, I did a quick search in the codebase but did not find any
> >>>>>>>>>> usage of
> >>>>>>>>>> ConfigOption<List>.
> >>>>>>>>>>
> >>>>>>>>>> 2. The same config name, but with two ConfigOption with
> >>>>>>>>>> different
> >>>>>>>> semantic
> >>>>>>>>>> in different component seems super confusing. For example, when
> >>> users
> >>>>>>>> set
> >>>>>>>>>> both configs, they may have no idea one is overriding the other.
> >>>>>>>>>> There
> >>>>>>>>>> might be two cases:
> >>>>>>>>>>      - If it is just the same config used by different
> >>>>>>>>>> components to
> >>>>>>>>>> act
> >>>>>>>>>> accordingly, it might be better to just have one config, but
> >>> describe
> >>>>>>>>>> clearly on how that config will be used.
> >>>>>>>>>>      - If it is actually two configurations that can be set
> >>>>>>>>>> differently, I
> >>>>>>>>>> think the config names should just be different.
> >>>>>>>>>>
> >>>>>>>>>> 3. Regarding the ConfigurableFactory, is the toConfiguration()
> >>> method
> >>>>>>>>>> pretty much means getConfiguration()? The toConfiguration()
> >>>>>>>>>> method
> >>>>>>>> sounds
> >>>>>>>>>> like converting an object to a configuration, which only
> >>>>>>>>>> works if
> >>> the
> >>>>>>>>>> object does not contain any state / value. I am also
> >>>>>>>>>> wondering if
> >>>>>>>>>> there
> >>>>>>>> is
> >>>>>>>>>> a real use case of this method. Because supposedly the
> >>> configurations
> >>>>>>>> could
> >>>>>>>>>> just be passed around to caller of this method.
> >>>>>>>>>>
> >>>>>>>>>> Also, can you put the proposal into the FLIP wiki instead of
> >>>>>>>>>> in the
> >>>>>>>> Google
> >>>>>>>>>> doc before voting? The FLIP wiki allows track the modification
> >>>>>>>>>> history
> >>>>>>>> and
> >>>>>>>>>> has a more established structure to ensure nothing is missed.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>>
> >>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Aug 27, 2019 at 11:34 PM Timo Walther
> >>>>>>>>>> <[email protected]>
> >>>>>>>> wrote:
> >>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>
> >>>>>>>>>>> I updated the FLIP proposal one more time as mentioned in the
> >>> voting
> >>>>>>>>>>> thread. If there are no objections, I will start a new voting
> >>> thread
> >>>>>>>>>>> tomorrow at 9am Berlin time.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Timo
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 22.08.19 14:19, Timo Walther wrote:
> >>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>
> >>>>>>>>>>>> thanks for all the feedback we have received online and
> >>>>>>>>>>>> offline.
> >>> It
> >>>>>>>>>>>> showed that many people support the idea of evolving the Flink
> >>>>>>>>>>>> configuration functionality. I'm almost sure that this FLIP
> >>>>>>>>>>>> will
> >>>>>>>>>>>> not
> >>>>>>>>>>>> solve all issues but at least will improve the current status.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We've updated the document and replaced the Correlation part
> >>>>>>>>>>>> with the
> >>>>>>>>>>>> concept of a ConfigOptionGroup that can provide all available
> >>>>>>>>>>>> options
> >>>>>>>>>>>> of a group plus custom group validators for eager validation.
> >>>>>>>>>>>> For now,
> >>>>>>>>>>>> this eager group validation will only be used at certain
> >>>>>>>>>>>> locations in
> >>>>>>>>>>>> the Flink code but it prepares for maybe validating the entire
> >>>>>>>>>>>> global
> >>>>>>>>>>>> configuration before submitting a job in the future.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Please take another look if you find time. I hope we can
> >>>>>>>>>>>> proceed
> >>>>>>>>>>>> with
> >>>>>>>>>>>> the voting process if there are no objections.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>> Am 19.08.19 um 12:54 schrieb Timo Walther:
> >>>>>>>>>>>>> Hi Stephan,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> thanks for your suggestions. Let me give you some background
> >>> about
> >>>>>>>>>>>>> the decisions made in this FLIP:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework"
> >>>>>>>>>>>>> because we
> >>> did
> >>>>>>>>>>>>> not want to change the entire configuration infrastructure.
> >>>>>>>>>>>>> Both for
> >>>>>>>>>>>>> backwards-compatibility reasons and the amount of work that
> >>>>>>>>>>>>> would be
> >>>>>>>>>>>>> required to update all options. If our goal is to rework the
> >>>>>>>>>>>>> configuration option entirely, I might suggest to switch
> >>>>>>>>>>>>> to JSON
> >>>>>>>>>>>>> format with JSON schema and JSON validator. However, setting
> >>>>>>>>>>>>> properties in a CLI or web interface becomes more tricky the
> >>> more
> >>>>>>>>>>>>> nested structures are allowed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2. Class-based Options: The current ConfigOption<T> class is
> >>>>>>>>>>>>> centered
> >>>>>>>>>>>>> around Java classes where T is determined by the default
> >>>>>>>>>>>>> value.
> >>>>>>>>>>>>> The
> >>>>>>>>>>>>> FLIP just makes this more explicit by offering an explicit
> >>>>>>>>>>>>> `intType()` method etc. The current design of validators
> >>> centered
> >>>>>>>>>>>>> around Java classes makes it possible to have typical domain
> >>>>>>>>>>>>> validators baked by generics as you suggested. If we
> >>>>>>>>>>>>> introduce
> >>>>>>>>>>>>> types
> >>>>>>>>>>>>> such as "quantity with measure and unit" we still need to
> >>>>>>>>>>>>> get a
> >>>>>>>>>>>>> class
> >>>>>>>>>>>>> out of this option at the end, so why changing a proven
> >>>>>>>>>>>>> concept?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 3. List Options: The `isList` prevents having arbitrary
> >>>>>>>>>>>>> nesting. As
> >>>>>>>>>>>>> Dawid mentioned, we kept human readability in mind. For every
> >>>>>>>>>>>>> atomic
> >>>>>>>>>>>>> option like "key=12" can be represented by a list
> >>>>>>>>>>>>> "keys=12;13".
> >>>>>>>>>>>>> But
> >>>>>>>>>>>>> we don't want to go further; esp. no nesting. A dedicated
> >>>>>>>>>>>>> list
> >>>>>>>>>>>>> option
> >>>>>>>>>>>>> would start making this more complicated such as
> >>>>>>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...),
> >>>>>>>>>>>>> StringOption(...)))", do we want that?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 4. Correlation: The correlation part is one of the
> >>>>>>>>>>>>> suggestions
> >>>>>>>>>>>>> that I
> >>>>>>>>>>>>> like least in the document. We can also discuss removing it
> >>>>>>>>>>>>> entirely,
> >>>>>>>>>>>>> but I think it solves the use case of relating options
> >>>>>>>>>>>>> with each
> >>>>>>>>>>>>> other in a flexible way right next to the actual option.
> >>>>>>>>>>>>> Instead of
> >>>>>>>>>>>>> being hidden in some component initialization, we should
> >>>>>>>>>>>>> put it
> >>>>>>>>>>>>> close
> >>>>>>>>>>>>> to the option to also perform validation eagerly instead of
> >>>>>>>>>>>>> failing
> >>>>>>>>>>>>> at runtime when the option is accessed the first time.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen:
> >>>>>>>>>>>>>> A "List Type" sounds like a good direction to me.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The comment on the type system was a bit brief, I agree. The
> >>>>>>>>>>>>>> idea is
> >>>>>>>>>>>>>> to see
> >>>>>>>>>>>>>> if something like that can ease validation. Especially the
> >>>>>>>>>> correlation
> >>>>>>>>>>>>>> system seems quite complex (proxies to work around order of
> >>>>>>>>>>>>>> initialization).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For example, let's assume we don't think primarily about
> >>>>>>>>>>>>>> "java
> >>>>>>>>>>>>>> types" but
> >>>>>>>>>>>>>> would define types as one of the following (just examples,
> >>>>>>>>>>>>>> haven't
> >>>>>>>>>>>>>> thought
> >>>>>>>>>>>>>> all the details through):
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        (a) category type: implies string, and a fix set of
> >>> possible
> >>>>>>>>>> values.
> >>>>>>>>>>>>>> Those would be passes and naturally make it into the docs
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>> validation.
> >>>>>>>>>>>>>> Maps to a String or Enum in Java.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        (b) numeric integer type: implies long (or optionally
> >>>>>>>>>>>>>> integer,
> >>>>>>>> if
> >>>>>>>>>>>>>> we want
> >>>>>>>>>>>>>> to automatically check overflow / underflow). would take
> >>> typical
> >>>>>>>>>> domain
> >>>>>>>>>>>>>> validators, like non-negative, etc.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        (c) numeric real type: same as above (double or
> >>>>>>>>>>>>>> float)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        (d) numeric interval type: either defined as an
> >>> interval, or
> >>>>>>>>>>>>>> references
> >>>>>>>>>>>>>> other parameter by key. validation by valid interval.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        (e) quantity: a measure and a unit. separately
> >>>>>>>>>>>>>> parsable.
> >>> The
> >>>>>>>>>>>>>> measure's
> >>>>>>>>>>>>>> type could be any of the numeric types above, with same
> >>>>>>>>>>>>>> validation
> >>>>>>>>>>>>>> rules.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> With a system like the above, would we still correlation
> >>>>>>>>>>>>>> validators?
> >>>>>>>>>>>>>> Are
> >>>>>>>>>>>>>> there still cases that we need to catch early (config
> >>>>>>>>>>>>>> loading)
> >>> or
> >>>>>>>>>>>>>> are the
> >>>>>>>>>>>>>> remaining cases sufficiently rare and runtime or setup
> >>> specific,
> >>>>>>>>>>>>>> that it is
> >>>>>>>>>>>>>> fine to handle them in component initialization?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz
> >>>>>>>>>>>>>> <[email protected]>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Stephan,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thank you for your opinion.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Actually list/composite types are the topics we spent the
> >>>>>>>>>>>>>>> most of
> >>>>>>>>>> the
> >>>>>>>>>>>>>>> time. I understand that from a perspective of a full blown
> >>> type
> >>>>>>>>>>>>>>> system,
> >>>>>>>>>>>>>>> a field like isList may look weird. Please let me
> >>>>>>>>>>>>>>> elaborate a
> >>>>>>>>>>>>>>> bit
> >>>>>>>>>> more
> >>>>>>>>>>>>>>> on the reason behind it though. Maybe we weren't clear
> >>>>>>>>>>>>>>> enough
> >>>>>>>>>>>>>>> about
> >>>>>>>>>> it
> >>>>>>>>>>>>>>> in the FLIP. The key feature of all the conifg options is
> >>>>>>>>>>>>>>> that they
> >>>>>>>>>>>>>>> must
> >>>>>>>>>>>>>>> have a string representation as they might come from a
> >>>>>>>> configuration
> >>>>>>>>>>>>>>> file. Moreover it must be a human readable format, so
> >>>>>>>>>>>>>>> that the
> >>>>>>>>>> values
> >>>>>>>>>>>>>>> might be manually adjusted. Having that in mind we did not
> >>>>>>>>>>>>>>> want to
> >>>>>>>>>>>>>>> add a
> >>>>>>>>>>>>>>> support of an arbitrary nesting and we decided to allow for
> >>>>>>>>>>>>>>> lists
> >>>>>>>>>> only
> >>>>>>>>>>>>>>> (and flat objects - I think though in the current design
> >>>>>>>>>>>>>>> there is a
> >>>>>>>>>>>>>>> mistake around the Configurable interface). I think though
> >>>>>>>>>>>>>>> you have
> >>>>>>>>>> a
> >>>>>>>>>>>>>>> point here and it would be better to have a
> >>>>>>>>>>>>>>> ListConfigOption
> >>>>>>>>>>>>>>> instead of
> >>>>>>>>>>>>>>> this field. Does it make sense to you?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> As for the second part of your message. I am not sure if I
> >>>>>>>>>> understood
> >>>>>>>>>>>>>>> it. The validators work with parse/deserialized values from
> >>>>>>>>>>>>>>> Configuration that means they can be bound to the generic
> >>>>>>>>>>>>>>> parameter
> >>>>>>>>>> of
> >>>>>>>>>>>>>>> Configuration. You can have a RangeValidator<? extends
> >>>>>>>>>>>>>>> Comparable/Number>. I don't think the type hierarchy in the
> >>>>>>>>>>>>>>> ConfigOption
> >>>>>>>>>>>>>>> has anything to do with the validation logic. Could you
> >>>>>>>>>>>>>>> elaborate a
> >>>>>>>>>>>>>>> bit
> >>>>>>>>>>>>>>> more what did you mean?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Dawid
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 18/08/2019 16:42, Stephan Ewen wrote:
> >>>>>>>>>>>>>>>> I like the idea of enhancing the configuration and to do
> >>> early
> >>>>>>>>>>>>>>> validation.
> >>>>>>>>>>>>>>>> I feel that some of the ideas in the FLIP seem a bit ad
> >>>>>>>>>>>>>>>> hoc,
> >>>>>>>>>>>>>>>> though. For
> >>>>>>>>>>>>>>>> example, having a boolean "isList" is a clear
> >>>>>>>>>>>>>>>> indication of
> >>> not
> >>>>>>>>>>>>>>>> having
> >>>>>>>>>>>>>>>> thought through the type/category system.
> >>>>>>>>>>>>>>>> Also, having a more clear category system makes validation
> >>>>>>>> simpler.
> >>>>>>>>>>>>>>>> For example, I have seen systems distinguishing between
> >>> numeric
> >>>>>>>>>>>>>>> parameters
> >>>>>>>>>>>>>>>> (valid ranges), category parameters (set of possible
> >>>>>>>>>>>>>>>> values),
> >>>>>>>>>>>>>>>> quantities
> >>>>>>>>>>>>>>>> like duration and memory size (need measure and unit),
> >>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>> results in
> >>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>> elegant system for validation.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <
> >>>>>>>>>> [email protected]
> >>>>>>>>>>>>>>> .invalid>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +1 to this, thanks Timo and Dawid for the design.
> >>>>>>>>>>>>>>>>> This allows the currently cluttered configuration of
> >>>>>>>>>>>>>>>>> various
> >>>>>>>>>>>>>>>>>       modules to be unified.
> >>>>>>>>>>>>>>>>> This is also first step of one of the keys to making new
> >>>>>>>>>>>>>>>>> unified
> >>>>>>>>>>>>>>>>> TableEnvironment available for production.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Previously, we did encounter complex configurations,
> >>>>>>>>>>>>>>>>> such as
> >>>>>>>>>>>>>>>>> specifying the skewed values of column in DDL. The
> >>>>>>>>>>>>>>>>> skew may
> >>>>>>>>>>>>>>>>>       be a single field or a combination of multiple
> >>>>>>>>>>>>>>>>> fields.
> >>>>>>>>>>>>>>>>> So the
> >>>>>>>>>>>>>>>>>       configuration is very troublesome. We used JSON
> >>>>>>>>>>>>>>>>> string
> >>> to
> >>>>>>>>>>>>>>>>>       configure it.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Jingsong Lee
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>> ------------------------------------------------------------------
> >>>>>>>>>>>>>>>>> From:Jark Wu <[email protected]>
> >>>>>>>>>>>>>>>>> Send Time:2019年8月16日(星期五) 16:44
> >>>>>>>>>>>>>>>>> To:dev <[email protected]>
> >>>>>>>>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and
> >>>>>>>>>> Configuration
> >>>>>>>>>>>>>>>>> Thanks for starting this design Timo and Dawid,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Improving ConfigOption has been hovering in my mind for a
> >>> long
> >>>>>>>>>> time.
> >>>>>>>>>>>>>>>>> We have seen the benefit when developing blink
> >>>>>>>>>>>>>>>>> configurations and
> >>>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>>> properties in 1.9 release.
> >>>>>>>>>>>>>>>>> Thanks for bringing it up and make such a detailed
> >>>>>>>>>>>>>>>>> design.
> >>>>>>>>>>>>>>>>> I will leave my thoughts and comments there.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, 16 Aug 2019 at 22:30, Zili Chen <
> >>> [email protected]
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> It looks interesting. Thanks for preparing this FLIP!
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Client API enhancement benefit from this evolution which
> >>>>>>>>>>>>>>>>>> hopefully provides a better view of configuration of
> >>>>>>>>>>>>>>>>>> Flink.
> >>>>>>>>>>>>>>>>>> In client API enhancement, we likely make the deployment
> >>>>>>>>>>>>>>>>>> of cluster and submission of job totally defined by
> >>>>>>>>>> configuration.
> >>>>>>>>>>>>>>>>>> Will take a look at the document in days.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>> tison.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Timo Walther <[email protected]> 于2019年8月16日周五
> >>>>>>>>>>>>>>>>>> 下午10:12写道：
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Dawid and I are working on making parts of
> >>>>>>>>>>>>>>>>>>> ExecutionConfig and
> >>>>>>>>>>>>>>>>>>> TableConfig configurable via config options. This is
> >>>>>>>>>>>>>>>>>>> necessary
> >>>>>>>>>>>>>>>>>>> to make
> >>>>>>>>>>>>>>>>>>> all properties also available in SQL. Additionally,
> >>>>>>>>>>>>>>>>>>> with
> >>> the
> >>>>>>>>>>>>>>>>>>> new SQL
> >>>>>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>>>>> based on properties as well as more connectors and
> >>>>>>>>>>>>>>>>>>> formats
> >>>>>>>>>>>>>>>>>>> coming up,
> >>>>>>>>>>>>>>>>>>> unified configuration becomes more important.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> We need more features around string-based configuration
> >>>>>>>>>>>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>> future,
> >>>>>>>>>>>>>>>>>>> which is why Dawid and I would like to propose
> >>>>>>>>>>>>>>>>>>> FLIP-54 for
> >>>>>>>>>>>>>>>>>>> evolving
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> ConfigOption and Configuration classes:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit
> >>>
> >>>>>>>>>>>>>>>>>>> In summary it adds:
> >>>>>>>>>>>>>>>>>>> - documented types and validation
> >>>>>>>>>>>>>>>>>>> - more common types such as memory size, duration, list
> >>>>>>>>>>>>>>>>>>> - simple non-nested object types
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Looking forward to your feedback,
> >>>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >
>
>

Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

Reply via email to