Hi Becket,
let me try to clarify some of your questions:
1. For every option, we also needed to think about how to represent it
in a human readable format. We do not want to allow arbitrary nesting
because that would easily allow to bypass the flattened hierarchy of
config options (`session.memory.min`). The current design allows to
represent every option type as a list. E.g.:
`myIntOption: 12` can be `myIntListOption: 12;12`
`myObjectOption: field=12,other=true` can be `myObjectListOption:
field=12,other=true; field=12,other=true`
`myPropertyOption: key=str0,other=str1` can be `myPropertyListOption:
key=str0,other=str1;key=str0,other=str1`
We need the atomic class for serialization/deserialization both in
binary and string format.
ConfigOption<List> is not present in the code base yet, but this FLIP is
a preparation of making ExecutionConfig configurable. If you look into
this class or also in existing table connectors/formats, you will see
that each proposed option type has its requirements.
2. Regarding extending the description of ConfigOptions, the semantic of
one option should be a super set of the other option. E.g. in Table API
we might use general ExecutionConfig properties. But we would like to a)
make external options more prominent in the Table API config docs to
link people to properties they should pay attention b) notice about side
effects. The core semantic of a property should not change.
3. The factory will not receive the entire configuration but works in a
separate key space. For `myObjectOption` above, it would receive a
configuration that consists of `field: 12` and `other: true`.
I agree. I will convert the document into a Wiki page today.
Thanks,
Timo
On 29.08.19 09:00, Stephan Ewen wrote:
@Becket One thing that may be non-obvious is that the Configuration class
also defines serialization / persistence logic at the moment. So it needs
to know the set of types it supports. That stands in the way of an
arbitrary generic map type.
@Timo I agree though that it seems a bit inconsistent to have one
collection orthogonal to the type (List) and another one bound to specific
types (Map).
On Thu, Aug 29, 2019 at 8:20 AM Becket Qin <becket....@gmail.com> wrote:
Hi Timo,
Thanks for the proposal. Sorry for the late comments, but I have a few
questions / comments.
1. Is a new field of isList necessary in the ConfigOption?
Would it be enough to just check the atomicClass to see if it is a List or
not?
Also, in the ConfigOption<Map> class case, are we always assume both key
and value types are String? Can we just apply the same to the
ConfigOption<List>?
BTW, I did a quick search in the codebase but did not find any usage of
ConfigOption<List>.
2. The same config name, but with two ConfigOption with different semantic
in different component seems super confusing. For example, when users set
both configs, they may have no idea one is overriding the other. There
might be two cases:
- If it is just the same config used by different components to act
accordingly, it might be better to just have one config, but describe
clearly on how that config will be used.
- If it is actually two configurations that can be set differently, I
think the config names should just be different.
3. Regarding the ConfigurableFactory, is the toConfiguration() method
pretty much means getConfiguration()? The toConfiguration() method sounds
like converting an object to a configuration, which only works if the
object does not contain any state / value. I am also wondering if there is
a real use case of this method. Because supposedly the configurations could
just be passed around to caller of this method.
Also, can you put the proposal into the FLIP wiki instead of in the Google
doc before voting? The FLIP wiki allows track the modification history and
has a more established structure to ensure nothing is missed.
Thanks,
Jiangjie (Becket) Qin
On Tue, Aug 27, 2019 at 11:34 PM Timo Walther <twal...@apache.org> wrote:
Hi everyone,
I updated the FLIP proposal one more time as mentioned in the voting
thread. If there are no objections, I will start a new voting thread
tomorrow at 9am Berlin time.
Thanks,
Timo
On 22.08.19 14:19, Timo Walther wrote:
Hi everyone,
thanks for all the feedback we have received online and offline. It
showed that many people support the idea of evolving the Flink
configuration functionality. I'm almost sure that this FLIP will not
solve all issues but at least will improve the current status.
We've updated the document and replaced the Correlation part with the
concept of a ConfigOptionGroup that can provide all available options
of a group plus custom group validators for eager validation. For now,
this eager group validation will only be used at certain locations in
the Flink code but it prepares for maybe validating the entire global
configuration before submitting a job in the future.
Please take another look if you find time. I hope we can proceed with
the voting process if there are no objections.
Regards,
Timo
Am 19.08.19 um 12:54 schrieb Timo Walther:
Hi Stephan,
thanks for your suggestions. Let me give you some background about
the decisions made in this FLIP:
1. Goal: The FLIP is labelled "evolve" not "rework" because we did
not want to change the entire configuration infrastructure. Both for
backwards-compatibility reasons and the amount of work that would be
required to update all options. If our goal is to rework the
configuration option entirely, I might suggest to switch to JSON
format with JSON schema and JSON validator. However, setting
properties in a CLI or web interface becomes more tricky the more
nested structures are allowed.
2. Class-based Options: The current ConfigOption<T> class is centered
around Java classes where T is determined by the default value. The
FLIP just makes this more explicit by offering an explicit
`intType()` method etc. The current design of validators centered
around Java classes makes it possible to have typical domain
validators baked by generics as you suggested. If we introduce types
such as "quantity with measure and unit" we still need to get a class
out of this option at the end, so why changing a proven concept?
3. List Options: The `isList` prevents having arbitrary nesting. As
Dawid mentioned, we kept human readability in mind. For every atomic
option like "key=12" can be represented by a list "keys=12;13". But
we don't want to go further; esp. no nesting. A dedicated list option
would start making this more complicated such as
"ListOption(ObjectOption(ListOption(IntOption, ...),
StringOption(...)))", do we want that?
4. Correlation: The correlation part is one of the suggestions that I
like least in the document. We can also discuss removing it entirely,
but I think it solves the use case of relating options with each
other in a flexible way right next to the actual option. Instead of
being hidden in some component initialization, we should put it close
to the option to also perform validation eagerly instead of failing
at runtime when the option is accessed the first time.
Regards,
Timo
Am 18.08.19 um 23:32 schrieb Stephan Ewen:
A "List Type" sounds like a good direction to me.
The comment on the type system was a bit brief, I agree. The idea is
to see
if something like that can ease validation. Especially the
correlation
system seems quite complex (proxies to work around order of
initialization).
For example, let's assume we don't think primarily about "java
types" but
would define types as one of the following (just examples, haven't
thought
all the details through):
(a) category type: implies string, and a fix set of possible
values.
Those would be passes and naturally make it into the docs and
validation.
Maps to a String or Enum in Java.
(b) numeric integer type: implies long (or optionally integer, if
we want
to automatically check overflow / underflow). would take typical
domain
validators, like non-negative, etc.
(c) numeric real type: same as above (double or float)
(d) numeric interval type: either defined as an interval, or
references
other parameter by key. validation by valid interval.
(e) quantity: a measure and a unit. separately parsable. The
measure's
type could be any of the numeric types above, with same validation
rules.
With a system like the above, would we still correlation validators?
Are
there still cases that we need to catch early (config loading) or
are the
remaining cases sufficiently rare and runtime or setup specific,
that it is
fine to handle them in component initialization?
On Sun, Aug 18, 2019 at 6:36 PM Dawid Wysakowicz
<dwysakow...@apache.org>
wrote:
Hi Stephan,
Thank you for your opinion.
Actually list/composite types are the topics we spent the most of
the
time. I understand that from a perspective of a full blown type
system,
a field like isList may look weird. Please let me elaborate a bit
more
on the reason behind it though. Maybe we weren't clear enough about
it
in the FLIP. The key feature of all the conifg options is that they
must
have a string representation as they might come from a configuration
file. Moreover it must be a human readable format, so that the
values
might be manually adjusted. Having that in mind we did not want to
add a
support of an arbitrary nesting and we decided to allow for lists
only
(and flat objects - I think though in the current design there is a
mistake around the Configurable interface). I think though you have
a
point here and it would be better to have a ListConfigOption
instead of
this field. Does it make sense to you?
As for the second part of your message. I am not sure if I
understood
it. The validators work with parse/deserialized values from
Configuration that means they can be bound to the generic parameter
of
Configuration. You can have a RangeValidator<? extends
Comparable/Number>. I don't think the type hierarchy in the
ConfigOption
has anything to do with the validation logic. Could you elaborate a
bit
more what did you mean?
Best,
Dawid
On 18/08/2019 16:42, Stephan Ewen wrote:
I like the idea of enhancing the configuration and to do early
validation.
I feel that some of the ideas in the FLIP seem a bit ad hoc,
though. For
example, having a boolean "isList" is a clear indication of not
having
thought through the type/category system.
Also, having a more clear category system makes validation simpler.
For example, I have seen systems distinguishing between numeric
parameters
(valid ranges), category parameters (set of possible values),
quantities
like duration and memory size (need measure and unit), which
results in
an
elegant system for validation.
On Fri, Aug 16, 2019 at 5:22 PM JingsongLee <
lzljs3620...@aliyun.com
.invalid>
wrote:
+1 to this, thanks Timo and Dawid for the design.
This allows the currently cluttered configuration of various
modules to be unified.
This is also first step of one of the keys to making new unified
TableEnvironment available for production.
Previously, we did encounter complex configurations, such as
specifying the skewed values of column in DDL. The skew may
be a single field or a combination of multiple fields. So the
configuration is very troublesome. We used JSON string to
configure it.
Best,
Jingsong Lee
------------------------------------------------------------------
From:Jark Wu <imj...@gmail.com>
Send Time:2019年8月16日(星期五) 16:44
To:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and
Configuration
Thanks for starting this design Timo and Dawid,
Improving ConfigOption has been hovering in my mind for a long
time.
We have seen the benefit when developing blink configurations and
connector
properties in 1.9 release.
Thanks for bringing it up and make such a detailed design.
I will leave my thoughts and comments there.
Cheers,
Jark
On Fri, 16 Aug 2019 at 22:30, Zili Chen <wander4...@gmail.com>
wrote:
Hi Timo,
It looks interesting. Thanks for preparing this FLIP!
Client API enhancement benefit from this evolution which
hopefully provides a better view of configuration of Flink.
In client API enhancement, we likely make the deployment
of cluster and submission of job totally defined by
configuration.
Will take a look at the document in days.
Best,
tison.
Timo Walther <twal...@apache.org> 于2019年8月16日周五
下午10:12写道:
Hi everyone,
Dawid and I are working on making parts of ExecutionConfig and
TableConfig configurable via config options. This is necessary
to make
all properties also available in SQL. Additionally, with the
new SQL
DDL
based on properties as well as more connectors and formats
coming up,
unified configuration becomes more important.
We need more features around string-based configuration in the
future,
which is why Dawid and I would like to propose FLIP-54 for
evolving
the
ConfigOption and Configuration classes:
https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit
In summary it adds:
- documented types and validation
- more common types such as memory size, duration, list
- simple non-nested object types
Looking forward to your feedback,
Timo