Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

Timo Walther Tue, 03 Sep 2019 01:55:50 -0700

Hi Becket,

it is definitely API but I would rather consider configuration assecond-level API. Usually, a connector or plugin should have some nicebuilder pattern with helpful JavaDocs instead of just exposing the pureConfigOptions.

I think there is some general misunderstanding what a configurableobject should be:

1) When serializing a configurable object into a string representation,all attributes are put under a single top-level key. Your example of aplugin class should already spread multiple keys and should not use aconfigurable object.

2) A configurable object should not represent an entire configurationlike ExecutionConfig. Because why should you put all 50 attributes undera single key?

3) "avoid letting the Configurations to host arbitrary objects beyondthe primitive types": We need a way of expressing List<Tuple3<String,String, Boolean>> or List<Tuple2<String, Integer>>. There is no cleanway of expressing such data structures by only using primitive types.


Regards,
Timo



On 03.09.19 09:08, Dawid Wysakowicz wrote:


Hi Becket,

Regarding your example of the Configurable, this is not what weenvisioned. What we think of a Configurable is a way to store simplePOJOs that might be parameters for your some part of your system. Theexample you described should be rather expressed in a following way:


// Define the Plugin class ConfigOption.
ConfigOption<Class<SomePluginInterface>> option =
     ConfigOptions.key("pluginConfigKey")
  .classType(org.apache.flink.SomePluginInterface.class);

class Host extends Configurable {

        private String address;
        private Int port;

        private static final ConfigOption<String> addressOption
        private static final ConfigOption<Int> portOption;

        //let's assume the original solution for now
        private void fromConfiguration(ConfigurationReader config) {

                this.address = config.get(addressOption);
                this.port = config.get(portOption);
        }

}

ConfigOption<Host> hostOption =
     ConfigOptions.key("configKey1ForMyPluginClass")
  .classType(org.apache.flink.SomePluginInterface.class);


  // Instantiate the user configured plugin
Class<SomePluginInterface> pluginClass = configuration.get(pluginConfigKey);
Host host = configuration.get(hostOption);
pluginClass.newInstance(host);

  // Programmatically, users will do the following to set the plugin.
configurations.set(option, MyPluginClass.class);
configurations.set(hostOption, new Host()); // set the
configurations required by my plugin class.

// Non-programmatically, users can do the following to set the plugin in a
config file, e.g. yaml
pluginConfigKey: PATH_TO_MyPluginClass
configKey1ForMyPluginClass: host: localhost, port: 1234 // see that all options 
of Configurable are stored in a single key

"I'd imagine that in most cases, after a concrete Configurable (sayExecutionConfig) instance is created from the Configuration instance,we will just pass around the ExecutionConfig instead of theConfiguration object."

That's not the case. It's the Configuration object that's sent aroundin most of the cases. Itwould be a major rework of the Configurationhandling to actually change that.


Best,

Dawid


On 03/09/2019 05:59, Becket Qin wrote:

Hi Timo,

I think I might have misunderstood the scope or motivation of the FLIP a
little bit. Please let me clarify a little bit.

Regarding "great if we don't put this burden on users", we should

consider who is actually using this API. It is not first-level API but
mostly API for Flink contributors. Most of the users will use API
classes ike ExecutionConfig or TableConfig or other builders for
performing configuration. They will never use the ConfigOptions classes
directly. So enforcing a duplicate method does not sound like a burden
to me.

I thought the configuration will be used by all the plugins and components
such as connectors. If that is the case, the Configuration sounds a public
API just like UDF. So one may implement a plugin and set that plugin class
in the configuration. Flink will do something like following:

// Define the Plugin ConfigOption. SomePluginInterface extends Configurable.
ConfigOption<SomePluginInterface> option =
     ConfigOptions.key("pluginConfigKey")

  .ConfigurableType(org.apache.flink.SomePluginInterface.class);

// Instantiate the user configured plugin
SomePluginInterface plugin =
configuration.getConfigurableInstance(pluginConfigKey);

// Programmatically, users will do the following to set the plugin.
// Set the plugin class, alternatively, we can always require a
ConfigurableFactory class as value.
configurations.setConfigurable("pluginConfigKey", MyPluginClass.class);
configurations.setInt("configKey1ForMyPluginClass", 123); // set the
configurations required by my plugin class.

// Non-programmatically, users can do the following to set the plugin in a
config file, e.g. yaml
pluginConfigKey: PATH_TO_MyPluginClass // Or PATH_TO_MyPluginClassFactory
configKey1ForMyPluginClass: 123

Internally, the configuration may discover the MyPluginClassFactory, call
MyPluginClassFactory.create(Configuration) and pass in itself as the
configuration argument.

 From user's perspective, the way to use Configurable is the following:
1. Set a class type of the Plugin in the configuration via Configuration
interface.
2. Provide a factory class for the Plugin, either by config value or by
service provider mechanism.
3. Set the configurations consumed by the plugin, via something like a yaml
file, or programmatically via Configuration interface.

How would the util that you are suggesting look like? It would always

need to serialize/deserialize an object into an immutable string. This
is not very efficient, given that the instance already exists and can be
made immutable by the implementer by not exposing setters. Furthermore,
we would loose the declarative approach and could not generate
documentation. The current approach specifies the static final
sub-ConfigOptions either in Configurable object (initial design) or in
the ConfigurableFactory (current design) such that the docs generator
can read them.

I'd imagine that in most cases, after a concrete Configurable (say
ExecutionConfig) instance is created from the Configuration instance, we
will just pass around the ExecutionConfig instead of the Configuration
object. If so, the serialization to / deserialization from String will only
happen once per JVM, which seems fine. I am not sure why the doc generation
would be impacted. As long as the ConfigOptions go into the Configurable or
ConfigurableFactory, the docs generator can still read them, right?

Regarding "Configurable may be created and configured directly without

reading settings from a Configuration instance", there seems to be a
misunderstanding. This is a very common case if not the most common. As
mentioned before, take ExecutionConfig. This configuration is currently
only used in a programmatic-way and needs a way to be expressed as
ConfigOptions. CachedFile for instance will be a Configurable object
that will binary serialized most of the time when sending it to the
cluster but due to the Configurable design it is possible to store it in
a string representation as well.

Thanks for the explanation. I feel this creating object then serialize /
deserialize using configuration is more of an internal use case. We are
essentially using the configurations to pass some arbitrary string around.
Technically speaking we can use this way to send and receive any object.
But I am not sure if this is something that we want to generalize and
impact more public use cases.
Personally I feel that Configurable
As for CachedFile, it seems we do not plan to use configuration to pass
that on? It would be good to avoid letting the Configurations to host
arbitrary objects beyond the primitive types.

To summarize, I am thinking if we should consider the following:
1. Design the Config mechanism as a cross-board API for not only internal
usage, but for broader use cases.
2. If writeToConfiguration is only for internal use cases, maybe we can
avoid adding it to the configurable interface. We can add another interface
such as ExtractableConfigurable for internal usage.

What do you think?

Thanks,

Jiangjie (Becket) Qin

On Mon, Sep 2, 2019 at 11:59 PM Timo Walther<twal...@apache.org>  wrote:

@Becket:

Regarding "great if we don't put this burden on users", we should
consider who is actually using this API. It is not first-level API but
mostly API for Flink contributors. Most of the users will use API
classes ike ExecutionConfig or TableConfig or other builders for
performing configuration. They will never use the ConfigOptions classes
directly. So enforcing a duplicate method does not sound like a burden
to me.

How would the util that you are suggesting look like? It would always
need to serialize/deserialize an object into an immutable string. This
is not very efficient, given that the instance already exists and can be
made immutable by the implementer by not exposing setters. Furthermore,
we would loose the declarative approach and could not generate
documentation. The current approach specifies the static final
sub-ConfigOptions either in Configurable object (initial design) or in
the ConfigurableFactory (current design) such that the docs generator
can read them.

Regarding "Configurable may be created and configured directly without
reading settings from a Configuration instance", there seems to be a
misunderstanding. This is a very common case if not the most common. As
mentioned before, take ExecutionConfig. This configuration is currently
only used in a programmatic-way and needs a way to be expressed as
ConfigOptions. CachedFile for instance will be a Configurable object
that will binary serialized most of the time when sending it to the
cluster but due to the Configurable design it is possible to store it in
a string representation as well.

@Aljoscha:

Yes, this approach would also work. We still would need to call
writeToConf/readFromConf for duplicate() and ensure immutable semantics,
if this is really an important use case. But IMHO all configuration is
currently mutable (all API classes like ExecutionConfig,
CheckpointConfig, Configuration itself), I don't understand why
immutability needs to be discussed here.

Regards,
Timo


On 02.09.19 16:22, Aljoscha Krettek wrote:

Hi,

Regarding the factory and duplicate() and whatnot, wouldn’t it work to

have a factory like this:

/**
   * Allows to read and write an instance from and to {@link

Configuration}. A configurable instance

   * operates in its own key space in {@link Configuration} and will be

(de)prefixed by the framework. It cannot access keys from other options. A
factory must have a default constructor.

   *
   */
public interface ConfigurableFactory<T extends Configurable> {

      /**
       * Creates an instance from the given configuration.
       */
      T fromConfiguration(ConfigurationReader configuration);
}

with Configurable being:

/**
   * Allows to read and write an instance from and to {@link

Configuration}. A configurable instance

   * operates in its own key space in {@link Configuration} and will be

(de)prefixed by the framework. It cannot access keys from other options. A
factory must have a default constructor.

   *
   */
public interface Configurable {

      /**
       * Writes this instance to the given configuration.
       */
      void writeToConfiguration(ConfigurationWriter configuration); //

method name TBD

}

This would make the Configurable immutable and we wouldn’t need a

duplicate() method.

Best,
Aljoscha

On 2. Sep 2019, at 14:40, Becket Qin<becket....@gmail.com>  wrote:

Hi Timo and Dawid,

Thanks for the patient reply. I agree that both option a) and option b)

can

solve the mutability problem.

For option a), is it a little intrusive to add a duplicate() method for

Configurable? It would be great if we don't put this burden on users if
possible.

For option b), I actually feel it is slightly better than option a) from
API perspective as getFactory() seems a more understandable method of a
Configurable compared with duplicate(). And users do not need to

implement

much more logic.

I am curious what is the downside of keeping the Configuration simple to
only have primitive types, and always create the Configurable using a

util

method? If Configurables are rare, do we need to put the instantiation /
bookkeeping logic in Configuration?

@Becket for the toConfiguration this is required for shipping the

Configuration to TaskManager, so that we do not have to use java
serializability.

Do you mean a Configurable may be created and configured directly

without

reading settings from a Configuration instance? I thought a Configurable
will always be created via a ConfigurableFactory by extracting required
configs from a Configuration. Am I missing something?

Thanks,

Jiangjie (Becket) Qin

On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz <dwysakow...@apache.org
wrote:

Hi Timo, Becket

  From the options that Timo suggested for improving the mutability
situation I would prefer option a) as this is the more explicit option
and simpler option. Just as a remark, I think in general Configurable
types for options will be rather very rare for some special use-cases,
as the majority of options are rather simpler parameters of primitive
types (+duration, memory)

@Becket for the toConfiguration this is required for shipping the
Configuration to TaskManager, so that we do not have to use java
serializability.

Best,

Dawid


On 02/09/2019 10:05, Timo Walther wrote:

Hi Becket,

Re 1 & 3: "values in configurations should actually be immutable"

I would also prefer immutability but most of our configuration is
mutable due to serialization/deserialization. Also maps and list could
be mutable in theory. It is difficult to really enforce that for
nested structures. I see two options:

a) For the original design: How about we force implementers to add a
duplicate() method in a Configurable object? This would mean that the
object is still mutable but by duplicating the object both during
reading and writing we would avoid the problem you described.

b) For the current design: We still use the factory approach but let a
Configurable object implement a getFactory() method such that we know
how to serialize the object. With the help of a factory we can also
duplicate the object easily during reading and writing and ensure
immutability.

I would personally go for approach a) to not over-engineer this topic.
But I'm open for option b).

Regards,
Timo


On 31.08.19 04:09, Becket Qin wrote:

Hi Timo,

Thanks for the reply. I am still a little concerned over the
mutability of
the Configurable which could be the value in Configuration.

Re: 1

But in general, people should not use any internal fields.
Configurable objects are meant for simple little helper POJOs, not
complex arbitrary nested data structures.

This seems difficult to enforce... Ideally the values in

configurations

should actually be immutable. The value can only be changed by
explicitly
calling setters in Configuration. Otherwise we may have weird

situation

where the Configurable in the same configuration are different in two
places because the configurable is modified in one places and not
modified
in another place. So I am a little concerned on putting a
Configurable type
in the Configuration map, because the value could be modified without
explicitly setting the configuration. For example, can users do the
following?

Configurable configurable =
configuration.getConfigurable(myConfigurableOption);
configurable.setConfigA(123); // this already changes the

configurable

object in the configuration.

Re: 2
Thanks for confirming. As long as users will not have a situation

where

they need to set two configurations with the same key but different
descriptions, I think it is OK.

Re: 3
This is actually kind of related to 1. i.e. Whether toConfiguration()
guarantees the exact same object can be rebuilt from the
configuration or
not. I am still not quite sure about the use case of

toConfiguration()

though. It seems indicating the Configurable is mutable, which might

be

dangerous.

Thanks,

Jiangjie (Becket) Qin

On Fri, Aug 30, 2019 at 10:04 PM Timo Walther<twal...@apache.org>
wrote:

Hi Becket,

1. First of all, you are totally right. The FLIP contains a bug due

to

the last minute changes that Dawid suggested: by having immutable
objects created by a factory we loose the serializability of the
Configuration because the factory itself is not stored in the
Configuration. I would propose to revert the last change and stick

to

the original design, which means that a object must implement the
Configurable interface and also implements
serialization/deserialization
methods such that also internal fields can be persisted as you
suggested. But in general, people should not use any internal

fields.

Configurable objects are meant for simple little helper POJOs, not
complex arbitrary nested data structures.

It is Map<String, Object> because Configuration stores the raw

objects.

If you put a Boolean option into it, it remains Boolean. This makes

the

map very efficient for shipping to the cluster and accessing options
multiple times. The same for configurable objects. We put the pure
objects into the map without any serialization/deserialization. The
provided factory allows to convert the Object into a Configuration

and

we know how to serialize/deserializise a configuration because it is
just a key/value map.

2. Yes, this is what we had in mind. It should still be the same
configuration option. We would like to avoid specialized option keys
across components (exec.max-para and table.exec.max-para) if they

are

describing basically the same thing. But adding some more

description

like "TableOptions.MAX_PARALLELISM with description_1 +

description_2"

does not hurt.

3. They should restore the original object given that the
toConfiguration/fromConfiguration methods have been implemented
correctly. I will extend the example to make the logic clearer while
fixing the bug.

Thanks for the healthy discussion,
Timo


On 30.08.19 15:29, Becket Qin wrote:

Hi Timo,

Thanks again for the clarification. Please see a few more questions

below.

Re: 1

Please also keep in mind that Configuration must not consist of

only

strings, it manages a Map<String, Object> for efficient access.

Every

map entry can have a string representation for persistence, but in
most
cases consists of unserialized objects.

I'd like to understand this a bit more. The reason we have a
Map<String,
Object> in Configuration was because that Object could be either a

String,

a List or a Map, right? But they eventually always boil down to
Strings,

or

maybe one of the predefined type that we know how to serialize. In

the

current design, can the Object also be Configurable?
If the value in the config Map<String, Object> can be Configurable

objects,

how do we serialize them? Calling toConfiguration() seems not quite

working

because there might be some other internal fields that are not

part of

the

configuration. The modification to those fields will be lost if we
simply
use toConfiguration(). So the impact of modifying those

Configurable

objects seems a little undefined. And it would be difficult to

prevent

users from doing that.
If the value in the config Map<String, Object> cannot be

Configurable

objects, then it seems a little weird to have all the other

ConfigType

stored in the ConfigMap in their own native type and accessed via
getInteger() / getBoolean(), etc, while having ConfigurableType to

be

different from others because one have to use ConfigurableFactory
to get
the configurations.

Re: 2

I think about the withExtendedDescription as a helper getter in a
different place, so that the option is easier to find in a

different

module from it was defined.
The MAX_PARALLELISM option in TableOptions would conceptually be
equal

to:

public ConfigOption getMaxParallelismOption() {
       return CoreOptions.MAX_PARALLELISM;
}

Just to make sure I understand it correctly, does that mean users

will

see

something like following?
    - CoreOptions.MAX_PARALLELISM with description_1;
    - TableOptions.MAX_PARALLELISM with description_1 +

description_2.

    - DataStreamOptions.MAX_PARALLELISM with description_1 +
description_3.
And users will only configure exactly one MAX_PARALLELISM cross the

board.

So they won't be confused by setting two MAX_PARALLELISM config for
two
different modules, while they are actually the same config. If

that is

the

case, I don't have further concern.

Re: 3
Maybe I am thinking too much. I thought toBytes() / fromBytes()
actually
restore the original Object. But fromConfiguration() and

toConfiguration()

does not do that, anything not in the configuration of the original

object

will be lost. So it would be good to make that clear. Maybe a clear
Java
doc is sufficient.

Thanks,

Jiangjie (Becket) Qin

On Fri, Aug 30, 2019 at 4:08 PM Dawid Wysakowicz
<dwysakow...@apache.org

wrote:

Hi,

Ad. 1

The advantage of our approach is that you have the type definition
close
to the option definition. The only difference is that it enables
expressing simple pojos with the primitives like
ConfigOption<Integer>,
ConfigOption<Long> etc. Otherwise as Timo said you will start

having

the parsing logic scattered everywhere in the code base as it is

now.

The string representation in our proposal is exactly the same as

you

explained for those three options. The only difference is that you
don't
have to parse the elements of a List, Map etc. afterwards.

Ad. 2

I think about the withExtendedDescription as a helper getter in a
different place, so that the option is easier to find in a

different

module from it was defined.

The MAX_PARALLELISM option in TableOptions would conceptually be
equal

to:

public ConfigOption getMaxParallelismOption() {

       return CoreOptions.MAX_PARALLELISM;

}

This allows to further clarify the description of the option in

the

context of a different module and end up in a seperate page in
documentation (but with a link to the original one). In the end

it is

exactly the same option. It has exactly same key, type, parsing
logic,
it is in the end forwarded to the same place.

Ad. 3

Not sure if I understand your concerns here. As Timo said it is in
the
end sth similar to toBytes/fromBytes, but it puts itself to a
Configuration. Also just wanted to make sure we adjusted this part
slightly and now the ConfigOption takes ConfigurableFactory.

Best,

Dawid


On 30/08/2019 09:39, Timo Walther wrote:

Hi Becket,

thanks for the discussion.

1. ConfigOptions in their current design are bound to classes.
Regarding, the option is "creating some Configurable objects

instead

of defining the config to create
those Configurable"? We just moved this logic to a factory, this
factory can then also be used for other purposes. However, how

the

option and objects are serialized to Configuration is still not

part

of the option. The option is just pure declaration.

If we would allow only List<String>, implementers would need to
start
implementing own parsing and validation logic all the time. We

would

like to avoid that.

Please also keep in mind that Configuration must not consist of

only

strings, it manages a Map<String, Object> for efficient access.
Every
map entry can have a string representation for persistence, but

in

most cases consists of unserialized objects.

2. MAX_PARALLELISM is still defined just once. We don't overwrite
keys, types or default values. But different layers might want to
add
helpful information. In our concrete use case for FLIP-59,
ExecutionConfig has 50 properties and many of them are not

relevant

for the Table layer or have no effect at all. We would like to

list

and mention the most important config options again in the Table
Configuration section, so that users are not confused, but with a
strong link to the core option. E.g.: registered kryo serializers
are
also important also for Table users, we would like to add the
comment
"This option allows to modify the serialization of the ANY SQL

data

type.". I think we should not spam the core configuration page

with

comments from all layers, connectors, or libraries but keep this

in

the corresponding component documentation.

3. But it is something like fromBytes() and toBytes()? It

serializes

and deserializes T from a configuration?

Regards,
Timo

On 29.08.19 19:14, Becket Qin wrote:

Hi Timo and Stephan,

Thanks for the detail explanation.

1. I agree that each config should be in a human readable
format. My
concern is that the current List<Configurable> looks going a

little

too far
from what the configuration is supposed to do. They are

essentially

creating some Configurable objects instead of defining the
config to
create
those Configurable. This mixes ConfigOption and the usage of it.
API
wise
it would be good to keep the configs and their usages (such as
how to
create objects using the ConfigOption) apart from each other.
I am wondering if we can just make List also only take string.

For

example,
is the following definition of map and list sufficient?

A MapConfigOption is ConfigOption<Map<String, String>>. It can

be

defined
as:
map_config_name: k1=v1, k2=v2, k3=v3, ...

A ListConfigOption is ConfigOption<List<String>>. It can be

defined

as:

list_config_name: v1, v2, v3, ...

A ListOfMapConfigOption is ConfigOption<List<Map<String,
String>>. It
can
be defined as:
list_of_map_config_name: k1=v1, k2=v2; k3=v3, k4=v4;....

All the key and values in the configuration are String. This

also

guarantees that the configuration is always serializable.
If we want to do one more step, we can allow the ConfigOption to
set

all

the primitive types and parse that for the users. So something

like

List<Integer>, List<Class<?>> seems fine.

The configuration class could also have util methods to create a
list

of

configurable such as:
<T> List<T>

<Configuration#getConfigurableInstances(ListMapConfigOption,

Class<T> clazz).
But the configuration class will not take arbitrary

Configurable as

the

value of its config.

2. I might have misunderstood this. But my concern on the
description
extension is in the following example.

public static final ConfigOption<Integer> MAX_PARALLELISM =

CoreOptions.MAX_PARALLELISM.withExtendedDescription(
"Note: That this property means that a table program has a
side-effect
XYZ.");

In this case, we will have two MAX_PARALLELISM configs now. One

is

CoreOptions.MAX_PARALLELISM. The other one is defined here. I
suppose
users
will see both configurations. One with an extended description
and one
without. Let's say there is a third component which also users
MAX_PARALLELISM, will there be yet another MAX_PARALLELISM
ConfigOption? If
so, what would that ConfigOption's description look like?

Ideally, we would want to have just one

CoreOptions.MAX_PARALLELISM

and the
description should clearly state all the usage of this
ConfigOption.

3. I see, in that case, how about we name it something like
extractConfiguration()? I am just trying to see if we can make

it

clear

this is not something like fromBytes() and toBytes().

Thanks,

Jiangjie (Becket) Qin

On Thu, Aug 29, 2019 at 6:09 PM Timo Walther <

twal...@apache.org>

wrote:

Hi Becket,

let me try to clarify some of your questions:

1. For every option, we also needed to think about how to
represent

it

in a human readable format. We do not want to allow arbitrary
nesting
because that would easily allow to bypass the flattened
hierarchy of
config options (`session.memory.min`). The current design
allows to
represent every option type as a list. E.g.:

`myIntOption: 12` can be `myIntListOption: 12;12`
`myObjectOption: field=12,other=true` can be

`myObjectListOption:

field=12,other=true; field=12,other=true`
`myPropertyOption: key=str0,other=str1` can be
`myPropertyListOption:
key=str0,other=str1;key=str0,other=str1`

We need the atomic class for serialization/deserialization

both in

binary and string format.

ConfigOption<List> is not present in the code base yet, but

this

FLIP is
a preparation of making ExecutionConfig configurable. If you

look

into

this class or also in existing table connectors/formats, you
will see
that each proposed option type has its requirements.

2. Regarding extending the description of ConfigOptions, the
semantic of
one option should be a super set of the other option. E.g. in
Table

API

we might use general ExecutionConfig properties. But we would

like

to a)
make external options more prominent in the Table API config
docs to
link people to properties they should pay attention b) notice
about
side
effects. The core semantic of a property should not change.

3. The factory will not receive the entire configuration but

works

in a

separate key space. For `myObjectOption` above, it would

receive a

configuration that consists of `field: 12` and `other: true`.

I agree. I will convert the document into a Wiki page today.

Thanks,
Timo

On 29.08.19 09:00, Stephan Ewen wrote:

@Becket One thing that may be non-obvious is that the
Configuration
class
also defines serialization / persistence logic at the moment.
So it
needs
to know the set of types it supports. That stands in the way
of an
arbitrary generic map type.

@Timo I agree though that it seems a bit inconsistent to have

one

collection orthogonal to the type (List) and another one

bound to

specific

types (Map).

On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <

becket....@gmail.com

wrote:

Hi Timo,

Thanks for the proposal. Sorry for the late comments, but I
have a
few
questions / comments.

1. Is a new field of isList necessary in the ConfigOption?
Would it be enough to just check the atomicClass to see if it
is a
List

or

not?
Also, in the ConfigOption<Map> class case, are we always

assume

both key
and value types are String? Can we just apply the same to the
ConfigOption<List>?
BTW, I did a quick search in the codebase but did not find

any

usage of
ConfigOption<List>.

2. The same config name, but with two ConfigOption with
different

semantic

in different component seems super confusing. For example,

when

users

set

both configs, they may have no idea one is overriding the

other.

There
might be two cases:
      - If it is just the same config used by different
components to
act
accordingly, it might be better to just have one config, but

describe

clearly on how that config will be used.
      - If it is actually two configurations that can be set
differently, I
think the config names should just be different.

3. Regarding the ConfigurableFactory, is the

toConfiguration()

method

pretty much means getConfiguration()? The toConfiguration()
method

sounds

like converting an object to a configuration, which only
works if

the

object does not contain any state / value. I am also
wondering if
there

is

a real use case of this method. Because supposedly the

configurations

could

just be passed around to caller of this method.

Also, can you put the proposal into the FLIP wiki instead of
in the

Google

doc before voting? The FLIP wiki allows track the

modification

history

and

has a more established structure to ensure nothing is missed.

Thanks,

Jiangjie (Becket) Qin

On Tue, Aug 27, 2019 at 11:34 PM Timo Walther
<twal...@apache.org>

wrote:

Hi everyone,

I updated the FLIP proposal one more time as mentioned in

the

voting

thread. If there are no objections, I will start a new

voting

thread

tomorrow at 9am Berlin time.

Thanks,
Timo


On 22.08.19 14:19, Timo Walther wrote:

Hi everyone,

thanks for all the feedback we have received online and
offline.

It

showed that many people support the idea of evolving the

Flink

configuration functionality. I'm almost sure that this FLIP
will
not
solve all issues but at least will improve the current

status.

We've updated the document and replaced the Correlation

part

with the
concept of a ConfigOptionGroup that can provide all

available

options
of a group plus custom group validators for eager

validation.

For now,
this eager group validation will only be used at certain
locations in
the Flink code but it prepares for maybe validating the

entire

global
configuration before submitting a job in the future.

Please take another look if you find time. I hope we can
proceed
with
the voting process if there are no objections.

Regards,
Timo

Am 19.08.19 um 12:54 schrieb Timo Walther:

Hi Stephan,

thanks for your suggestions. Let me give you some

background

about

the decisions made in this FLIP:

1. Goal: The FLIP is labelled "evolve" not "rework"
because we

did

not want to change the entire configuration

infrastructure.

Both for
backwards-compatibility reasons and the amount of work

that

would be
required to update all options. If our goal is to rework

the

configuration option entirely, I might suggest to switch
to JSON
format with JSON schema and JSON validator. However,

setting

properties in a CLI or web interface becomes more tricky

the

more

nested structures are allowed.

2. Class-based Options: The current ConfigOption<T> class

is

centered
around Java classes where T is determined by the default
value.
The
FLIP just makes this more explicit by offering an explicit
`intType()` method etc. The current design of validators

centered

around Java classes makes it possible to have typical

domain

validators baked by generics as you suggested. If we
introduce
types
such as "quantity with measure and unit" we still need to
get a
class
out of this option at the end, so why changing a proven
concept?

3. List Options: The `isList` prevents having arbitrary
nesting. As
Dawid mentioned, we kept human readability in mind. For

every

atomic
option like "key=12" can be represented by a list
"keys=12;13".
But
we don't want to go further; esp. no nesting. A dedicated
list
option
would start making this more complicated such as
"ListOption(ObjectOption(ListOption(IntOption, ...),
StringOption(...)))", do we want that?

4. Correlation: The correlation part is one of the
suggestions
that I
like least in the document. We can also discuss removing

it

entirely,
but I think it solves the use case of relating options
with each
other in a flexible way right next to the actual option.
Instead of
being hidden in some component initialization, we should
put it
close
to the option to also perform validation eagerly instead

of

failing
at runtime when the option is accessed the first time.

Regards,
Timo


Am 18.08.19 um 23:32 schrieb Stephan Ewen:

A "List Type" sounds like a good direction to me.

The comment on the type system was a bit brief, I agree.

The

idea is
to see
if something like that can ease validation. Especially

the

correlation

system seems quite complex (proxies to work around order

of

initialization).

For example, let's assume we don't think primarily about
"java
types" but
would define types as one of the following (just

examples,

haven't
thought
all the details through):

        (a) category type: implies string, and a fix set

of

possible

values.

Those would be passes and naturally make it into the docs
and
validation.
Maps to a String or Enum in Java.

        (b) numeric integer type: implies long (or

optionally

integer,

if

we want
to automatically check overflow / underflow). would take

typical

domain

validators, like non-negative, etc.

        (c) numeric real type: same as above (double or
float)

        (d) numeric interval type: either defined as an

interval, or

references
other parameter by key. validation by valid interval.

        (e) quantity: a measure and a unit. separately
parsable.

The

measure's
type could be any of the numeric types above, with same
validation
rules.

With a system like the above, would we still correlation
validators?
Are
there still cases that we need to catch early (config
loading)

or

are the
remaining cases sufficiently rare and runtime or setup

specific,

that it is
fine to handle them in component initialization?


On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz
<dwysakow...@apache.org>
wrote:

Hi Stephan,

Thank you for your opinion.

Actually list/composite types are the topics we spent

the

most of

the

time. I understand that from a perspective of a full

blown

type

system,
a field like isList may look weird. Please let me
elaborate a
bit

more

on the reason behind it though. Maybe we weren't clear
enough
about

it

in the FLIP. The key feature of all the conifg options

is

that they
must
have a string representation as they might come from a

configuration

file. Moreover it must be a human readable format, so
that the

values

might be manually adjusted. Having that in mind we did

not

want to
add a
support of an arbitrary nesting and we decided to allow

for

lists

only

(and flat objects - I think though in the current design
there is a
mistake around the Configurable interface). I think

though

you have

point here and it would be better to have a
ListConfigOption
instead of
this field. Does it make sense to you?

As for the second part of your message. I am not sure

if I

understood

it. The validators work with parse/deserialized values

from

Configuration that means they can be bound to the

generic

parameter

of

Configuration. You can have a RangeValidator<? extends
Comparable/Number>. I don't think the type hierarchy in

the

ConfigOption
has anything to do with the validation logic. Could you
elaborate a
bit
more what did you mean?

Best,

Dawid

On 18/08/2019 16:42, Stephan Ewen wrote:

I like the idea of enhancing the configuration and to

do

early

validation.

I feel that some of the ideas in the FLIP seem a bit ad
hoc,
though. For
example, having a boolean "isList" is a clear
indication of

not

having
thought through the type/category system.
Also, having a more clear category system makes

validation

simpler.

For example, I have seen systems distinguishing between

numeric

parameters

(valid ranges), category parameters (set of possible
values),
quantities
like duration and memory size (need measure and unit),
which
results in

an

elegant system for validation.


On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <

lzljs3620...@aliyun.com

.invalid>

wrote:

+1 to this, thanks Timo and Dawid for the design.
This allows the currently cluttered configuration of
various
       modules to be unified.
This is also first step of one of the keys to making

new

unified
TableEnvironment available for production.

Previously, we did encounter complex configurations,
such as
specifying the skewed values of column in DDL. The
skew may
       be a single field or a combination of multiple
fields.
So the
       configuration is very troublesome. We used JSON
string

to

       configure it.

Best,
Jingsong Lee

------------------------------------------------------------------

From:Jark Wu<imj...@gmail.com>
Send Time:2019年8月16日(星期五) 16:44
To:dev<dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and

Configuration

Thanks for starting this design Timo and Dawid,

Improving ConfigOption has been hovering in my mind

for a

long

time.

We have seen the benefit when developing blink
configurations and

connector

properties in 1.9 release.
Thanks for bringing it up and make such a detailed
design.
I will leave my thoughts and comments there.

Cheers,
Jark


On Fri, 16 Aug 2019 at 22:30, Zili Chen <

wander4...@gmail.com

wrote:

Hi Timo,

It looks interesting. Thanks for preparing this FLIP!

Client API enhancement benefit from this evolution

which

hopefully provides a better view of configuration of
Flink.
In client API enhancement, we likely make the

deployment

of cluster and submission of job totally defined by

configuration.

Will take a look at the document in days.

Best,
tison.


Timo Walther<twal...@apache.org>  于2019年8月16日周五
下午10:12写道：

Hi everyone,

Dawid and I are working on making parts of
ExecutionConfig and
TableConfig configurable via config options. This is
necessary
to make
all properties also available in SQL. Additionally,
with

the

new SQL

DDL

based on properties as well as more connectors and
formats
coming up,
unified configuration becomes more important.

We need more features around string-based

configuration

in the
future,
which is why Dawid and I would like to propose
FLIP-54 for
evolving

the

ConfigOption and Configuration classes:

https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit

In summary it adds:
- documented types and validation
- more common types such as memory size, duration,

list

- simple non-nested object types

Looking forward to your feedback,
Timo

Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

Reply via email to