Thanks for the update, Jark. +1 for your proposal.

Some minor feedback from my side:

sink.bulk-flush -> sink.buffer-flush?
zookeeper.znode.parent -> zookeeper.znode-parent?

for consistency.

username -> secrect.username?
password -> secrect.password?

How about we prefix options that should not show up in logs with "secret."?

I agree with Dawid, we should not mix factory identifier with updates to the property design. I would introduce a general `property-version` option that users can add to their DDL if they want to ensure backwards compatibility. All factories would get this option by default, they don't need to list it explicitly.

Regards,
Timo


On 31.03.20 09:21, Dawid Wysakowicz wrote:
Hi,

As for the `connector.property-version`. I have doubts that having
deprecated keys mechanism is enough to fully support backwards
compatibility. I do think we are changing not just the key names, but
structure of the keys as well as the discovery mechanism and I think it
is really hard to foresee all the interconnections. I don't think users
need to learn about the `connector.property-version` too much. By
default it would always be set to the newest version. Users could still
set this property to an old version if they don't want to migrate the
properties, but go fully against the old structure. I think such
mechanism could be useful.

I understand a similar thing can be achieved with renaming/changing the
version in the factoryIdentifier. 'kafka' can use the new properties,
'kafka-v1' the old properties. I think though having a dedicated
mechanism could make it a tad easier. I am not strict on that though,
just wanted to express it.

Best,

Dawid

On 31/03/2020 08:58, Jark Wu wrote:
Hi everyone,

Thanks for the great feedbacks so far.

I updated the FLIP documentation according to the discussion. Changes
include:
- remove "version" key, and merge it into "connector"
- add "scan", "lookup", "sink" prefix to some property keys if they are
only used in that case.
- add a "New Property Key" section to list all the previous property keys
and new property keys.

We use "scan" and "lookup" instead of "source" prefix because we should
distinguish them and they aligns to FLIP-95 ScanTableSource and
LookupTableSource.
I also colored red for some major change of property keys in the FLIP. I
will list some of them here too:

kafka:
connector.startup-mode => scan.startup.mode
connector.specific-offsets => scan.startup.specific-offsets
connector.startup-timestamp-millis => scan.startup.timestamp-millis
connector.sink-partitioner & connector.sink-partitioner-class =>
sink.partitioner

elasticsearch:
connector.key-delimiter => document-id.key-delimiter              # make it
explicit that it is used for document id
connector.key-null-literal => document-id.key-null-literal          # and
it also can be used for es sources in the future
connector.bulk-flush.back-off.type => sink.bulk-flush.back-off.strategy

jdbc:
connector.table => table-name

Welcome further feedbacks!

Best,
Jark


On Tue, 31 Mar 2020 at 14:45, Jark Wu <imj...@gmail.com> wrote:

Hi Kurt,

I also prefer "-" as version delimiter now. I didn't remove the "_"
proposal by mistake, that's why I sent another email last night :)
Regarding to "property-version", I also think we shouldn't let users to
learn about this. And ConfigOption provides a good ability
to support deprecated keys and auto-generate documentation for deprecated
keys.

Hi Danny,

Regarding to “connector.properties.*”:
In FLIP-95, the Factory#requiredOptions() and Factory#optionalOptions()
inferfaces are only used for generation of documentation.
It does not influence the discovery and validation of a factory. The
validation logic is defined by connectors
in createDynamicTableSource/Sink().
So you don't have to provide an option for "connector.properties.*". But I
think we should make ConfigOption support wildcard in the long term for a
full story.

I don't think we should inline all the "connector.properties.*",
otherwise, it will be very tricky for users to configure the properties.
Regarding to FLIP-113, I suggest to provide some ConfigOptions for
commonly used kafka properties and put them in the supportedHintOptions(),
e.g. "connector.properties.group.id",
"connector.properties.fetch.min.bytes".

Best,
Jark





On Tue, 31 Mar 2020 at 12:04, Danny Chan <yuzhao....@gmail.com> wrote:

Thanks Jark for bring up this discussion, +1 for this idea, I believe the
user has suffered from the verbose property key for long time.

Just one question, how do we handle the keys with wildcard, such as the
“connector.properties.*” in Kafka connector which would then hand-over to
Kafka client directly. As what suggested in FLIP-95, we use a ConfigOption
to describe the “supported properties”, then I have to concerns:

• For the new keys, do we still need to put multi-lines there the such
key, such as “connector.properties.abc” “connector.properties.def”, or
should we inline them, such as “some-key-prefix” = “k1=v1, k2=v2 ..."
• Should the ConfigOption support the wildcard ? (If we plan to support
the current multi-line style)


Best,
Danny Chan
在 2020年3月31日 +0800 AM12:37,Jark Wu <imj...@gmail.com>,写道:
Hi all,

Thanks for the feedbacks.

It seems that we have a conclusion to put the version into the factory
identifier. I'm also fine with this.
If we have this outcome, the interface of Factory#factoryVersion is not
needed anymore, this can simplify the learning cost of new factory.
We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
<twal...@apache.org>

kafka => kafka for 0.11+ versions, we don't suffix "-universal", because
the meaning of "universal" not easy to understand.
kafka-0.11 => kafka for 0.11 version
kafka-0.10 => kafka for 0.10 version
elasticsearch-6 => elasticsearch for 6.x versions
elasticsearch-7 => elasticsearch for 7.x versions
hbase-1.4 => hbase for 1.4.x versions
jdbc
filesystem

We use "-" as the version delimiter to make them to be more consistent.
This is not forces, users can also use other delimiters or without
delimiter.
But this can be a guilde in the Javadoc of Factory, to make the
connector
ecosystem to be more consistent.

What do you think?

------------------------

Regarding "connector.property-version":

Hi @Dawid Wysakowicz <dwysakow...@apache.org> , the new fatories are
designed not support to read current properties.
All the current properties are routed to the old factories if they are
using "connector.type". Otherwise, properties are routed to new
factories.
If I understand correctly, the "connector.property-version" is attched
implicitly by system, not manually set by users.
For example, the framework should add "connector.property-version=1" to
properties when processing DDL statement.
I'm fine to add a "connector.property-version=1" when processing DDL
statement, but I think it's also fine if we don't,
because this can be done in the future if need and the default version
can
be 1.

Best,
Jark

On Tue, 31 Mar 2020 at 00:36, Jark Wu <imj...@gmail.com> wrote:

Hi all,

Thanks for the feedbacks.

It seems that we have a conclusion to put the version into the factory
identifier. I'm also fine with this.
If we have this outcome, the interface of Factory#factoryVersion is
not
needed anymore, this can simplify the learning cost of new factory.
We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
<twal...@apache.org>

Btw, I would like to use "_" instead of "-" as the version delimiter,
because "-" looks like minus and may confuse users, e.g.
"elasticsearch-6".
This is not forced, but should be a guilde in the Javadoc of Factory.
I propose to use the following identifiers for existing connectors,

kafka => kafka for 0.11+ versions, we don't suffix "-universal",
because
the meaning of "universal" not easy to understand.
kafka-0.11 => kafka for 0.11 version
kafka-0.10 => kafka for 0.10 version
elasticsearch-6 => elasticsearch for 6.x versions
elasticsearch-7 => elasticsearch for 7.x versions
hbase-1.4 => hbase for 1.4.x versions
jdbc
filesystem

We use "-" as the version delimiter to make them to be more
consistent.
This is not forces, users can also use other delimiters or without
delimiter.
But this can be a guilde in the Javadoc of Factory, to make the
connector
ecosystem to be more consistent.

What do you think?

------------------------

Regarding "connector.property-version":

Hi @Dawid Wysakowicz <dwysakow...@apache.org> , the new fatories are
designed not support to read current properties.
All the current properties are routed to the old factories if they are
using "connector.type". Otherwise, properties are routed to new
factories.
If I understand correctly, the "connector.property-version" is attched
implicitly by system, not manually set by users.
For example, the framework should add "connector.property-version=1"
to
properties when processing DDL statement.
I'm fine to add a "connector.property-version=1" when processing DDL
statement, but I think it's also fine if we don't,
because this can be done in the future if need and the default
version can
be 1.

Best,
Jark





On Mon, 30 Mar 2020 at 23:21, Dawid Wysakowicz <
dwysakow...@apache.org>
wrote:

Hi all,

I like the overall design of the FLIP.

As for the withstanding concerns. I kind of like the approach to
put the
version into the factory identifier. I think it's the cleanest way
to
say that this version actually applies to the connector itself and
not
to the system it connects to. BTW, I think the outcome of this
discussion will affect interfaces described in FLIP-95. If we put
the
version into the functionIdentifier, do we need
Factory#factoryVersion?
I also think it does make sense to have a versioning for the
properties
as well. Are we able to read all the current properties with the new
factories? I think we could use the "connector.property-version" to
alternate between different Factory interfaces to support the old
set of
properties. Otherwise the new factories need to understand both set
of
properties, don't they?

Best,

Dawid

On 30/03/2020 17:07, Timo Walther wrote:
Hi Jark,

thanks for the FLIP. I don't have a strong opinion on
DynamicTableFactory#factoryVersion() for distiguishing connector
versions. We can also include it in the factory identifier
itself. For
Kafka, I would then just use `kafka` because the naming
"universal"
was just a helper solution at that time.

What we need is a good guide for how to design the options. We
should
include that in the JavaDoc of factory interface itself, once it
is
in-place. I like the difference between source and sink in
properties.
Regarding "connector.property-version":

It was meant for backwards compatibility along the dimension of
"property design" compared to "connector version". However, since
the
connector is now responsible for parsing all properties, it is
not as
necessary as it was before.

However, a change of the property design as it is done in FLIP-107
becomes more difficult without a property version and versioning
is
always a good idea in API world.

Regards,
Timo


On 30.03.20 16:38, Benchao Li wrote:
Hi Jark,

Thanks for starting the discussion. The FLIP LGTM generally.

Regarding connector version VS put version into connector's
name,
I favor the latter personally, using one property to locate a
connector can make the error log more precise.
 From the users' side, using one property to match a connector
will
be easier. Especially we have many connectors,
and some of the need version property required, and some of
them not.
Regarding Jingsong's suggestion,
IMO, it's a very good complement to the FLIP. Distinguishing
properties for source and sink can be very useful, and
also this will make the connector property more precise.
We are also sick of this for now, we cannot know whether a DDL
is a
source or sink unless we look through all queries where
the table is used.
Even more, some of the required properties are only required for
source, bug we cannot leave it blank for sink, and vice versa.
I think we can also add a type for dimension tables except
source and
sink.

Kurt Young <ykt...@gmail.com <mailto:ykt...@gmail.com>>
于2020年3月30日
周一 下午8:16写道:

It's not possible for the framework to throw such exception.
Because the
framework doesn't know what versions do the connector support.

Not really, we can list all valid connectors framework could
found. E.g.
user mistyped 'kafka-0.x', the error message will looks like:

we don't have any connector named "kafka-0.x", but we have:
FileSystem
Kafka-0.10
Kafka-0.11
ElasticSearch6
ElasticSearch7

Best,
Kurt


On Mon, Mar 30, 2020 at 5:11 PM Jark Wu <imj...@gmail.com
<mailto:imj...@gmail.com>> wrote:

Hi Kurt,

2) Lists all available connectors seems also quite
straightforward, e.g
user provided a wrong "kafka-0.8", we tell user we have
candidates of
"kakfa-0.11", "kafka-universal"
It's not possible for the framework to throw such exception.
Because the
framework doesn't know what versions do the connector support.
All the
version information is a blackbox in the identifier. But with
`Factory#factoryVersion()` interface, we can know all the
supported
versions.

3) I don't think so. We can still treat it as the same
connector but with
different versions.
That's true but that's weird. Because from the plain DDL
definition, they
look like different connectors with different "connector"
value, e.g.
'connector=kafka-0.8', 'connector=kafka-0.10'.

If users don't set any version, we will use
"kafka-universal"
instead.
The behavior is inconsistent IMO.
That is a long term vision when there is no kafka clusters
with <0.11
version.
At that point, "universal" is the only supported version in
Flink
and the
"version" key can be optional.

---------------------------------------------

Hi Jingsong,

"version" vs "kafka.version"
I though about it. But if we prefix "kafka" to version, we
should
prefix
"kafka" for all other property keys, because they are all
kafka
specific
options.
However, that will make the property set verbose, see rejected
option#2 in
the FLIP.

explicitly separate options for source and sink
That's a good topic. It's good to have a guideline for the new
property
keys.
I'm fine to prefix with a "source"/"sink" for some connector
keys.
Actually, we already do this in some connectors, e.g. jdbc and
hbase.
Best,
Jark

On Mon, 30 Mar 2020 at 16:36, Jingsong Li <
jingsongl...@gmail.com
<mailto:jingsongl...@gmail.com>> wrote:
Thanks Jark for the proposal.

+1 to the general idea.

For "version", what about "kafka.version"? It is obvious to
know its
meaning.

And I'd like to start a new topic:
Should we need to explicitly separate source from sink?
With the development of batch and streaming, more and more
connectors
have
both source and sink.

So should we set a rule for table properties:
- properties for both source and sink: without prefix, like
"topic"
- properties for source only: with "source." prefix, like
"source.startup-mode"
- properties for sink only: with "sink." prefix, like
"sink.partitioner"
What do you think?

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 3:56 PM Jark Wu <imj...@gmail.com
<mailto:imj...@gmail.com>> wrote:
Hi Kurt,

That's good questions.

the meaning of "version"
There are two versions in the old design. One is property
version
"connector.property-version" which can be used for
backward
compatibility.
The other one is "connector.version" which defines the
version of
external
system, e.g. 0.11" for kafka, "6" or "7" for ES.
In this proposal, the "version" is the previous
"connector.version".
The
""connector.property-version" is not introduced in new
design.
how to keep the old capability which can evolve
connector
properties
The "connector.property-version" is an optional key in the
old design
and
is never bumped up.
I'm not sure how "connector.property-version" should work
in the
initial
design. Maybe @Timo Walther <twal...@apache.org
<mailto:twal...@apache.org>> has more knowledge on
this.
But for the new properties, every options should be
expressed as
`ConfigOption` which provides `withDeprecatedKeys(...)`
method to
easily
support evolving keys.

a single keys instead of two, e.g. "kafka-0.11",
"kafka-universal"?
There are several benefit to use separate "version" key I
can
see:
1) it's more readable to separete them into different
keys,
because
they
are orthogonal concepts.
2) the planner can give all the availble versions in the
exception
message,
if user uses a wrong version (this is often reported in
user ML).
3) If we use "kafka-0.11" as connector identifier, we may
have to
write a
full documentation for each version, because they are
different
"connector"?
IMO, for 0.11, 0.11, etc... kafka, they are actually
the same
connector
but with different "client jar" version,
they share all the same supported property keys and
should reside
together.
4) IMO, the future vision is version-free. At some point
in
the future,
we
may don't need users to specify kafka version anymore, and
make
"version=universal" as optional or removed in the future.
This is can
be
done easily if they are separate keys.

Best,
Jark


On Mon, 30 Mar 2020 at 14:45, Kurt Young <
ykt...@gmail.com
<mailto:ykt...@gmail.com>> wrote:
Hi Jark,

Thanks for the proposal, I'm +1 to the general idea.
However I have a
question about "version",
in the old design, the version seems to be aimed for
tracking
property
version, with different
version, we could evolve these step by step without
breaking backward
compatibility. But in this
design, version is representing external system's
version, like
"0.11"
for
kafka, "6" or "7" for ES.
I'm not sure if this is necessary, what's the benefit of
using two
keys
instead of one, like "kafka-0.11"
or "ES6" directly? And how about the old capability
which
could let
us
evolving connector properties?

Best,
Kurt


On Mon, Mar 30, 2020 at 2:36 PM LakeShen
<shenleifight...@gmail.com <mailto:shenleifight...@gmail.com>>
wrote:

Hi Jark,

I am really looking forward to this feature. I think
this
feature
could simplify flink sql code,and at the same time ,
it could make the developer more easlier to config the
flink sql
WITH
options.

Now when I am using flink sql to write flink task ,
sometimes I
think
the
WITH options is too long for user.
For example,I config the kafka source connector
parameter,for
consumer
group and brokers parameter:

'connector.properties.0.key' = 'group.id
<http://group.id>'
, 'connector.properties.0.value' = 'xxx'
, 'connector.properties.1.key' =
'bootstrap.servers'
, 'connector.properties.1.value' = 'xxxxx'

I can understand this config , but for the flink fresh
man,maybe it
is confused for him.
In my thought, I am really looking forward to this
feature,thank
you
to
propose this feature.

Best wishes,
LakeShen


Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com>>
2020年3月30日周一 下午2:02写道:
Hi everyone,

I want to start a discussion about further improve
and
simplify
our
current
connector porperty keys, aka WITH options.
Currently,
we have a
'connector.' prefix for many properties, but they
are
verbose,
and
we
see a
big inconsistency between the properties when
designing
FLIP-107.
So we propose to remove all the 'connector.' prefix
and
rename
'connector.type' to 'connector', 'format.type' to
'format'. So a
new
Kafka
DDL may look like this:

CREATE TABLE kafka_table (
...
) WITH (
'connector' = 'kafka',
'version' = '0.10',
'topic' = 'test-topic',
'startup-mode' = 'earliest-offset',
'properties.bootstrap.servers' = 'localhost:9092',
'properties.group.id <http://properties.group.id>'
=
'testGroup',
'format' = 'json',
'format.fail-on-missing-field' = 'false'
);

The new connector property key set will come
together
with new
Factory
inferface which is proposed in FLIP-95. Old
properties
are still
compatible
with their existing implementation. New properties
are only
available
in
new DynamicTableFactory implementations.

You can access the detailed FLIP here:



https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
Best,
Jark


--
Best, Jingsong Lee



--

Benchao Li
School of Electronics Engineering and Computer Science, Peking
University
Tel:+86-15650713730
Email:libenc...@gmail.com
<mailto:libenc...@gmail.com>;libenc...@pku.edu.cn
<mailto:libenc...@pku.edu.cn>



Reply via email to