Thanks for the update Danny. +1 from my side.

Regards,
Timo


On 07.04.20 13:25, Danny Chan wrote:
Hi, every ~

It seems that we all agree to drop the idea for white/black list for each
connector, and have a global config option to default disable this feature.

I have also discussed with Timo and Jark about the interface
TableSourceTable.Context.getExecutionOptions and finally we decide to
introduce a new interface CatalogTable#copy(Map<String, String>) to support
re-generate the table with new table options.

So let me summarize the current design broadly again:

    - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe the
    dynamic table options
    - There is no constraint on which option key can be used in the OPTIONS,
    that means, any option key is allowed, the factory would to the validation
    work finally
    - Introduce method CatalogTable#copy, we use this method to regenerate a
    new CatalogTable to find a table factory and creates table source/sink
    - There is a global config option to default disable this feature (if
    user uses OPTIONS, an exception throws to tell open the option)

I have updated the WIKI
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL>,
look forward to your suggestions ~

Jark Wu <imj...@gmail.com> 于2020年4月7日周二 上午11:24写道:

I'm fine to disable this feature by default and avoid
whitelisting/blacklisting. This simplifies a lot of things.

Regarding to TableSourceFactory#Context#getExecutionOptions, do we really
need this interface?
Should the connector factory be aware of the properties is merged with
hints or not?
What's the problem if we always get properties from
`CatalogTable#getProperties`?

Best,
Jark

On Tue, 7 Apr 2020 at 10:39, Kurt Young <ykt...@gmail.com> wrote:

Sounds like a reasonable compromise, disabling this feature by default
is a
way to protect
the vulnerability, and we can simplify the design quite a lot. We can
gather some users'
feedback to see whether further protections are necessary in the future.

Best,
Kurt


On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <twal...@apache.org> wrote:

I agree with Aljoscha. The length of this thread shows that this is
highly controversal. I think nobody really likes this feature 100% but
we could not find a better solution. I would consider it as a
nice-to-have improvement during a notebook/debugging session.

I would accept avoiding whitelisting/blacklisting if the feature is
disabled by default. And we make the merged properties available in a
separate TableSourceFactory#Context#getExecutionOptions as Danny
proposed.

What do you think?

Thanks,
Timo


On 06.04.20 09:59, Aljoscha Krettek wrote:
The reason I'm saying it should be disabled by default is that this
uses
hint syntax, and hints should really not change query semantics.

I'm quite strongly against hints that change query semantics, but if
we
disable this by default I would be (reluctantly) OK with the feature.
Companies that create deployments or set up the SQL environment for
users can enable the feature if they want.

But yes, I also agree that we don't need whitelisting/blacklisting,
which makes this a lot easier to do.

Best,
Aljoscha

On 06.04.20 04:27, Danny Chan wrote:
Hi, everyone ~

@Aljoscha @Timo

I think we're designing ourselves into ever more complicated
corners
here

I kindly agree that, personally didn't see strong reasons why we
should limit on each connector properties:

• we can define any table options for CREATE TABLE, why we treat the
dynamic options differently, we never consider any security problems
when create table, we should not either for dynamic table options
• If we do not have whitelist properties or blacklist properties,
the
table source creation work would be much easier, just used the
merged
options. There is no need to modify each connector to decide which
options could be overridden and how we merge them(the merge work is
redundant).
• @Timo, how about we support another interface
`TableSourceFactory#Context.getExecutionOptions`, we always use this
interface to get the options to create our table source. There is no
need to copy the catalog table itselt, we just need to generate our
Context correctly.
• @Aljoscha I agree to have a global config option, but I disagree
to
default disable it, a global default config would break the user
experience too much, especially when user want to modify the options
in a ad-hoc way.



I suggest to remove `TableSourceFactory#supportedHintOptions` or
`TableSourceFactory#forbiddenHintOptions` based on the fact that we
does not have black/white list for CREATE TABLE at all at lease for
current codebase.


@Timo (i have replied offline but allows to represent it here again)

The `TableSourceFactory#supportedHintOptions` doesn't work well for
3
reasons compared to `TableSourceFactory#forbiddenHintOptions`:
1. For key with wildcard, like connector.property.* , use a
blacklist
make us have the ability to disable some of the keys under that,
i.e.
connector.property.key1 , a whitelist can only match with prefix

2. We want the connectors to have the ability to disable format type
switch format.type but allows all the other properties, e.g.
format.*
without format.type(let's call it SET_B), if we use the whitelist,
we
have to enumerate all the specific format keys start with format
(SET_B), but with the old connector factories, we have no idea what
specific format keys it supports(there is either a format.* or
nothing).

3. Except the cases for 1 and 2, for normal keys(no wildcard), the
blacklist and whitelist has the same expressiveness, use blacklist
makes the code not too verbose to enumerate all the duplicate keys
with #supportedKeys .(Not very strong reason, but i think as a
connector developer, it makes sense)

Best,
Danny Chan
在 2020年4月3日 +0800 PM11:28,Timo Walther <twal...@apache.org>,写道:
Hi everyone,

@Aljoscha: I disagree with your approach because a `Catalog` can
return
a custom factory that is not using any properties. The hinting must
be
transparent to a factory. We should NOT modify the metadata
`CatalogTable` at any point in time after the catalog.

@Danny, @Jingsong: How about we stick to the original design that
we
wanted to vote on but use:

Set<String> supportedHintProperties()

This fits better to the old factory design. And for the new FLIP-95
factories we will use `ConfigOption` and provide good utilities for
merging with hints etc.

We can allow `"format.*"` in `supportedHintProperties()` to allow
hinting in formats.

What do you think?

Regards,
Timo


On 02.04.20 16:24, Aljoscha Krettek wrote:
I think we're designing ourselves into ever more complicated
corners
here. Maybe we need to take a step back and reconsider. What would
you
think about this (somewhat) simpler proposal:

- introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
CONNECTOR_PROPERTIES, depending on what naming we want to have for
this
in the future. This will simply overwrite all connector
properties,
the
table factories don't know about hints but simply work with the
properties that they are given

- this special hint is disabled by default and can be activated
with a
global option "foo.bazzle.connector-hints" (or something like
this)
which has a warning that describes that this can change query
semantics
etc.

That's it. This makes connector implementations a lot easier while
still
allowing inline configuration.

I still don't like using hint syntax at all for this, because I
strongly
maintain that hints should not change query syntax. In general
using
hints should be kept to a minimum because they usually point to
shortcomings in the system.

Best,
Aljoscha

On 02.04.20 06:06, Jingsong Li wrote:
Hi Dawid,

When a factory is instantiated it has access to the
CatalogTable,
therefore it has access to all the original properties. In turn
it
knows
the original format and can call
FormatFactory#supportedHintOptions().

Factory can only get CatalogTable when creating source or sink,
right? IIUC, TableFactory may be stateless too.
When invoking SourceFactory#supportedHintOptions(), it can not
get CatalogTable, so it is impossible to create FormatFactory?

Best,
Jingsong Lee

On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao....@gmail.com

wrote:

Hi, Dawid, thanks so much for the detail suggestions ~

1. Regarding the motivation:

I agree it's not a good suggested way based on the fact that we
have
better solution, but i think we can support override that as
long
as it
exists as one of the the table options. I would remove if from
the
motication part.

2. The options passes around during sql-to-rel conversion, right
after we
generate the RelOptTable (CatalogSourceTable#toRel in Flink),
this
is
indeed a push way method at least in the RelOptTable layer, then
we hand
over the options to TableSourceFactory with our own context,
which
is
fine
becuase TableSourceFactory#Context is the contact to pass around
these
table-about variables.

3. "We should not end up with an extreme example where we can
change the
connector type", i totally agree that, and i have listed the
"connector.type" as forbidden attribute in the WIKI. As for the
format, i
think the connector itself can/should control whether to
override
the
"format.type", that is one of the reason i change the
TableSourceFactory#supportedHintOpitons to
TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
limit the
format keys we want conveniently.

4. SQL Hints syntax.

I think the k and v in the hint_item should be QUOTED_STRING
(not
sure
if it is equivalent to string_literal).

I disagree, we at least should keep sync with our DDL: use the
string
literal as the key. We did also support the simple identifier
because
this
is the common hint syntax from Calcite, it does not hurt
anything
for
the
OPTIONS hint, the unsupported keys would validate fails.(If you
think
that
may cause some confuse, i can make the syntax pluggable for each
hint in
CALCITE 1.23)

We only supports OPTIONS hint in the FLIP, and i have changed
the
title to
"Supports dynamic table options", would make it more clear in
the
WIKI.

5. Yes, we also have this concerns from our offline discussion,
that is
one of the reason, why i change the
TableSourceFactory#supportedHintOpitons
to TableSourceFactory#forbiddenHintOpitons, i do not choose
Set<String>
supportedHintOptionKeys() with wildcard for 2 reasons:

     - The wildcard is still not descriptive, we can still not
forbidden one
of the properties among the wildcard properties, we can not
enable
or
disable them totally
     - ConfigOption is our new structure for keys, and it does
not
support
wildcard yet.

Best,
Danny Chan
在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
<dwysakow...@apache.org>,写道:
Hi,
Few comments from my side:
1. Regarding the motivation:
I think the example with changing the update-mode is not a good
one. In
the long term this should be done with EMIT CHANGELOG (discussed
in
FLIP-105).
Nitpicking: I would mention that it is rather for
debugging/ad-hoc
solution. I think this should not be a recommended way for
production
use
cases as it bypasses the Catalog, which should be the source of
truth.
2. I could not understand how the additional options will be
passed to
the TableSourceFactory. Could you elaborate a bit more on that?
I
see
there
is a Context interface that gives the options. But cannot find a
way
to get
the context itself in the factory. Moreover I think it would
make
more
sense to have rather a push based approach here. Something like
applyOptions(ReadableConfig) method.
3. As for the concerns Jingsong raised in the voting thread. I
think it
is not a big problem, but I agree this should be also
described. I
disagree
with "Connector don't know format information in TableFactory
before
obtains real properties, so it can not list any format
`supportedHintOptions`".
When a factory is instantiated it has access to the
CatalogTable,
therefore it has access to all the original properties. In turn
it
knows
the original format and can call
FormatFactory#supportedHintOptions().
The only case when this would not work would be if we allow
changing
the
format of the Table (e.g. from avro to parquet), which does not
sound
like
a good idea to me. I think this feature should not end up as a
way
to
declare a whole table inline in a SQL query, but should rather
be
a
simple
way for debugging queries. We should not end up with an extreme
example
where we do:
SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
'format.type' = 'json', ....) */
4. SQL Hints syntax.
I think the k and v in the hint_item should be QUOTED_STRING
(not
sure
if it is equivalent to string_literal). I think we should not
use
simple_identifier because this implies that we cannot use e.g.
any
SQL
keywords. Anyway it has nothing to do with identifiers. If I am
not
mistaken it is also how the options in the CREATE statement are
implemented.
What is the purpose of the remaining hint_item:
hint_name(hint_opt
[
,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
feeling it
does also suggests to support the whole Apache Calcite hint
system
without
specifying that explicitly. Is the intention of the FLIP to
support
choosing e.g. JOIN strategies through hints already? If it is so
it
should
be mentioned in the FLIP, imo.
5. I think something does not work around the
supportedHintOptions and
wildcards. How do you want to represent a wildcard key as a
ConfigOption? I
am not sure about that, just a though, maybe it make sense to
have
rather
Set<String> supportedHintOptionKeys()?
Best,
Dawid











Reply via email to