Hi, every ~

It seems that we all agree to drop the idea for white/black list for each
connector, and have a global config option to default disable this feature.

I have also discussed with Timo and Jark about the interface
TableSourceTable.Context.getExecutionOptions and finally we decide to
introduce a new interface CatalogTable#copy(Map<String, String>) to support
re-generate the table with new table options.

So let me summarize the current design broadly again:

   - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe the
   dynamic table options
   - There is no constraint on which option key can be used in the OPTIONS,
   that means, any option key is allowed, the factory would to the validation
   work finally
   - Introduce method CatalogTable#copy, we use this method to regenerate a
   new CatalogTable to find a table factory and creates table source/sink
   - There is a global config option to default disable this feature (if
   user uses OPTIONS, an exception throws to tell open the option)

I have updated the WIKI
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL>,
look forward to your suggestions ~

Jark Wu <imj...@gmail.com> 于2020年4月7日周二 上午11:24写道:

> I'm fine to disable this feature by default and avoid
> whitelisting/blacklisting. This simplifies a lot of things.
>
> Regarding to TableSourceFactory#Context#getExecutionOptions, do we really
> need this interface?
> Should the connector factory be aware of the properties is merged with
> hints or not?
> What's the problem if we always get properties from
> `CatalogTable#getProperties`?
>
> Best,
> Jark
>
> On Tue, 7 Apr 2020 at 10:39, Kurt Young <ykt...@gmail.com> wrote:
>
> > Sounds like a reasonable compromise, disabling this feature by default
> is a
> > way to protect
> > the vulnerability, and we can simplify the design quite a lot. We can
> > gather some users'
> > feedback to see whether further protections are necessary in the future.
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <twal...@apache.org> wrote:
> >
> > > I agree with Aljoscha. The length of this thread shows that this is
> > > highly controversal. I think nobody really likes this feature 100% but
> > > we could not find a better solution. I would consider it as a
> > > nice-to-have improvement during a notebook/debugging session.
> > >
> > > I would accept avoiding whitelisting/blacklisting if the feature is
> > > disabled by default. And we make the merged properties available in a
> > > separate TableSourceFactory#Context#getExecutionOptions as Danny
> > proposed.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > > On 06.04.20 09:59, Aljoscha Krettek wrote:
> > > > The reason I'm saying it should be disabled by default is that this
> > uses
> > > > hint syntax, and hints should really not change query semantics.
> > > >
> > > > I'm quite strongly against hints that change query semantics, but if
> we
> > > > disable this by default I would be (reluctantly) OK with the feature.
> > > > Companies that create deployments or set up the SQL environment for
> > > > users can enable the feature if they want.
> > > >
> > > > But yes, I also agree that we don't need whitelisting/blacklisting,
> > > > which makes this a lot easier to do.
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > > On 06.04.20 04:27, Danny Chan wrote:
> > > >> Hi, everyone ~
> > > >>
> > > >> @Aljoscha @Timo
> > > >>
> > > >>> I think we're designing ourselves into ever more complicated
> corners
> > > >> here
> > > >>
> > > >> I kindly agree that, personally didn't see strong reasons why we
> > > >> should limit on each connector properties:
> > > >>
> > > >> • we can define any table options for CREATE TABLE, why we treat the
> > > >> dynamic options differently, we never consider any security problems
> > > >> when create table, we should not either for dynamic table options
> > > >> • If we do not have whitelist properties or blacklist properties,
> the
> > > >> table source creation work would be much easier, just used the
> merged
> > > >> options. There is no need to modify each connector to decide which
> > > >> options could be overridden and how we merge them(the merge work is
> > > >> redundant).
> > > >> • @Timo, how about we support another interface
> > > >> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> > > >> interface to get the options to create our table source. There is no
> > > >> need to copy the catalog table itselt, we just need to generate our
> > > >> Context correctly.
> > > >> • @Aljoscha I agree to have a global config option, but I disagree
> to
> > > >> default disable it, a global default config would break the user
> > > >> experience too much, especially when user want to modify the options
> > > >> in a ad-hoc way.
> > > >>
> > > >>
> > > >>
> > > >> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> > > >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> > > >> does not have black/white list for CREATE TABLE at all at lease for
> > > >> current codebase.
> > > >>
> > > >>
> > > >> @Timo (i have replied offline but allows to represent it here again)
> > > >>
> > > >> The `TableSourceFactory#supportedHintOptions` doesn't work well for
> 3
> > > >> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> > > >> 1. For key with wildcard, like connector.property.* , use a
> blacklist
> > > >> make us have the ability to disable some of the keys under that,
> i.e.
> > > >> connector.property.key1 , a whitelist can only match with prefix
> > > >>
> > > >> 2. We want the connectors to have the ability to disable format type
> > > >> switch format.type but allows all the other properties, e.g.
> format.*
> > > >> without format.type(let's call it SET_B), if we use the whitelist,
> we
> > > >> have to enumerate all the specific format keys start with format
> > > >> (SET_B), but with the old connector factories, we have no idea what
> > > >> specific format keys it supports(there is either a format.* or
> > nothing).
> > > >>
> > > >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> > > >> blacklist and whitelist has the same expressiveness, use blacklist
> > > >> makes the code not too verbose to enumerate all the duplicate keys
> > > >> with #supportedKeys .(Not very strong reason, but i think as a
> > > >> connector developer, it makes sense)
> > > >>
> > > >> Best,
> > > >> Danny Chan
> > > >> 在 2020年4月3日 +0800 PM11:28,Timo Walther <twal...@apache.org>,写道:
> > > >>> Hi everyone,
> > > >>>
> > > >>> @Aljoscha: I disagree with your approach because a `Catalog` can
> > return
> > > >>> a custom factory that is not using any properties. The hinting must
> > be
> > > >>> transparent to a factory. We should NOT modify the metadata
> > > >>> `CatalogTable` at any point in time after the catalog.
> > > >>>
> > > >>> @Danny, @Jingsong: How about we stick to the original design that
> we
> > > >>> wanted to vote on but use:
> > > >>>
> > > >>> Set<String> supportedHintProperties()
> > > >>>
> > > >>> This fits better to the old factory design. And for the new FLIP-95
> > > >>> factories we will use `ConfigOption` and provide good utilities for
> > > >>> merging with hints etc.
> > > >>>
> > > >>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> > > >>> hinting in formats.
> > > >>>
> > > >>> What do you think?
> > > >>>
> > > >>> Regards,
> > > >>> Timo
> > > >>>
> > > >>>
> > > >>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> > > >>>> I think we're designing ourselves into ever more complicated
> corners
> > > >>>> here. Maybe we need to take a step back and reconsider. What would
> > you
> > > >>>> think about this (somewhat) simpler proposal:
> > > >>>>
> > > >>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> > > >>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for
> > > this
> > > >>>> in the future. This will simply overwrite all connector
> properties,
> > > the
> > > >>>> table factories don't know about hints but simply work with the
> > > >>>> properties that they are given
> > > >>>>
> > > >>>> - this special hint is disabled by default and can be activated
> > with a
> > > >>>> global option "foo.bazzle.connector-hints" (or something like
> this)
> > > >>>> which has a warning that describes that this can change query
> > > semantics
> > > >>>> etc.
> > > >>>>
> > > >>>> That's it. This makes connector implementations a lot easier while
> > > >>>> still
> > > >>>> allowing inline configuration.
> > > >>>>
> > > >>>> I still don't like using hint syntax at all for this, because I
> > > >>>> strongly
> > > >>>> maintain that hints should not change query syntax. In general
> using
> > > >>>> hints should be kept to a minimum because they usually point to
> > > >>>> shortcomings in the system.
> > > >>>>
> > > >>>> Best,
> > > >>>> Aljoscha
> > > >>>>
> > > >>>> On 02.04.20 06:06, Jingsong Li wrote:
> > > >>>>> Hi Dawid,
> > > >>>>>
> > > >>>>>> When a factory is instantiated it has access to the
> CatalogTable,
> > > >>>>> therefore it has access to all the original properties. In turn
> it
> > > >>>>> knows
> > > >>>>> the original format and can call
> > > FormatFactory#supportedHintOptions().
> > > >>>>>
> > > >>>>> Factory can only get CatalogTable when creating source or sink,
> > > >>>>> right? IIUC, TableFactory may be stateless too.
> > > >>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> > > >>>>> get CatalogTable, so it is impossible to create FormatFactory?
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Jingsong Lee
> > > >>>>>
> > > >>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao....@gmail.com
> >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> > > >>>>>>
> > > >>>>>> 1. Regarding the motivation:
> > > >>>>>>
> > > >>>>>> I agree it's not a good suggested way based on the fact that we
> > have
> > > >>>>>> better solution, but i think we can support override that as
> long
> > > >>>>>> as it
> > > >>>>>> exists as one of the the table options. I would remove if from
> the
> > > >>>>>> motication part.
> > > >>>>>>
> > > >>>>>> 2. The options passes around during sql-to-rel conversion, right
> > > >>>>>> after we
> > > >>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink),
> this
> > > is
> > > >>>>>> indeed a push way method at least in the RelOptTable layer, then
> > > >>>>>> we hand
> > > >>>>>> over the options to TableSourceFactory with our own context,
> which
> > > is
> > > >>>>>> fine
> > > >>>>>> becuase TableSourceFactory#Context is the contact to pass around
> > > >>>>>> these
> > > >>>>>> table-about variables.
> > > >>>>>>
> > > >>>>>> 3. "We should not end up with an extreme example where we can
> > > >>>>>> change the
> > > >>>>>> connector type", i totally agree that, and i have listed the
> > > >>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
> > > >>>>>> format, i
> > > >>>>>> think the connector itself can/should control whether to
> override
> > > the
> > > >>>>>> "format.type", that is one of the reason i change the
> > > >>>>>> TableSourceFactory#supportedHintOpitons to
> > > >>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> > > >>>>>> limit the
> > > >>>>>> format keys we want conveniently.
> > > >>>>>>
> > > >>>>>> 4. SQL Hints syntax.
> > > >>>>>>
> > > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> (not
> > > >>>>>>> sure
> > > >>>>>> if it is equivalent to string_literal).
> > > >>>>>>
> > > >>>>>> I disagree, we at least should keep sync with our DDL: use the
> > > string
> > > >>>>>> literal as the key. We did also support the simple identifier
> > > because
> > > >>>>>> this
> > > >>>>>> is the common hint syntax from Calcite, it does not hurt
> anything
> > > for
> > > >>>>>> the
> > > >>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you
> > > think
> > > >>>>>> that
> > > >>>>>> may cause some confuse, i can make the syntax pluggable for each
> > > >>>>>> hint in
> > > >>>>>> CALCITE 1.23)
> > > >>>>>>
> > > >>>>>> We only supports OPTIONS hint in the FLIP, and i have changed
> the
> > > >>>>>> title to
> > > >>>>>> "Supports dynamic table options", would make it more clear in
> the
> > > >>>>>> WIKI.
> > > >>>>>>
> > > >>>>>> 5. Yes, we also have this concerns from our offline discussion,
> > > >>>>>> that is
> > > >>>>>> one of the reason, why i change the
> > > >>>>>> TableSourceFactory#supportedHintOpitons
> > > >>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> > > >>>>>> Set<String>
> > > >>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> > > >>>>>>
> > > >>>>>>     - The wildcard is still not descriptive, we can still not
> > > >>>>>> forbidden one
> > > >>>>>> of the properties among the wildcard properties, we can not
> enable
> > > or
> > > >>>>>> disable them totally
> > > >>>>>>     - ConfigOption is our new structure for keys, and it does
> not
> > > >>>>>> support
> > > >>>>>> wildcard yet.
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Danny Chan
> > > >>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> > > >>>>>> <dwysakow...@apache.org>,写道:
> > > >>>>>>> Hi,
> > > >>>>>>> Few comments from my side:
> > > >>>>>>> 1. Regarding the motivation:
> > > >>>>>>> I think the example with changing the update-mode is not a good
> > > >>>>>>> one. In
> > > >>>>>> the long term this should be done with EMIT CHANGELOG (discussed
> > in
> > > >>>>>> FLIP-105).
> > > >>>>>>> Nitpicking: I would mention that it is rather for
> > debugging/ad-hoc
> > > >>>>>> solution. I think this should not be a recommended way for
> > > production
> > > >>>>>> use
> > > >>>>>> cases as it bypasses the Catalog, which should be the source of
> > > >>>>>> truth.
> > > >>>>>>> 2. I could not understand how the additional options will be
> > > >>>>>>> passed to
> > > >>>>>> the TableSourceFactory. Could you elaborate a bit more on that?
> I
> > > see
> > > >>>>>> there
> > > >>>>>> is a Context interface that gives the options. But cannot find a
> > way
> > > >>>>>> to get
> > > >>>>>> the context itself in the factory. Moreover I think it would
> make
> > > >>>>>> more
> > > >>>>>> sense to have rather a push based approach here. Something like
> > > >>>>>> applyOptions(ReadableConfig) method.
> > > >>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I
> > > >>>>>>> think it
> > > >>>>>> is not a big problem, but I agree this should be also
> described. I
> > > >>>>>> disagree
> > > >>>>>> with "Connector don't know format information in TableFactory
> > before
> > > >>>>>> obtains real properties, so it can not list any format
> > > >>>>>> `supportedHintOptions`".
> > > >>>>>>> When a factory is instantiated it has access to the
> CatalogTable,
> > > >>>>>> therefore it has access to all the original properties. In turn
> it
> > > >>>>>> knows
> > > >>>>>> the original format and can call
> > > >>>>>> FormatFactory#supportedHintOptions().
> > > >>>>>>> The only case when this would not work would be if we allow
> > > changing
> > > >>>>>>> the
> > > >>>>>> format of the Table (e.g. from avro to parquet), which does not
> > > sound
> > > >>>>>> like
> > > >>>>>> a good idea to me. I think this feature should not end up as a
> way
> > > to
> > > >>>>>> declare a whole table inline in a SQL query, but should rather
> be
> > a
> > > >>>>>> simple
> > > >>>>>> way for debugging queries. We should not end up with an extreme
> > > >>>>>> example
> > > >>>>>> where we do:
> > > >>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> > > >>>>>> 'format.type' = 'json', ....) */
> > > >>>>>>> 4. SQL Hints syntax.
> > > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> (not
> > > >>>>>>> sure
> > > >>>>>> if it is equivalent to string_literal). I think we should not
> use
> > > >>>>>> simple_identifier because this implies that we cannot use e.g.
> any
> > > >>>>>> SQL
> > > >>>>>> keywords. Anyway it has nothing to do with identifiers. If I am
> > not
> > > >>>>>> mistaken it is also how the options in the CREATE statement are
> > > >>>>>> implemented.
> > > >>>>>>> What is the purpose of the remaining hint_item:
> > hint_name(hint_opt
> > > [
> > > >>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> > > >>>>>> feeling it
> > > >>>>>> does also suggests to support the whole Apache Calcite hint
> system
> > > >>>>>> without
> > > >>>>>> specifying that explicitly. Is the intention of the FLIP to
> > support
> > > >>>>>> choosing e.g. JOIN strategies through hints already? If it is so
> > it
> > > >>>>>> should
> > > >>>>>> be mentioned in the FLIP, imo.
> > > >>>>>>> 5. I think something does not work around the
> > > >>>>>>> supportedHintOptions and
> > > >>>>>> wildcards. How do you want to represent a wildcard key as a
> > > >>>>>> ConfigOption? I
> > > >>>>>> am not sure about that, just a though, maybe it make sense to
> have
> > > >>>>>> rather
> > > >>>>>> Set<String> supportedHintOptionKeys()?
> > > >>>>>>> Best,
> > > >>>>>>> Dawid
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to