+1, LGTM

Best,
Kurt


On Wed, Apr 8, 2020 at 10:27 AM Jark Wu <imj...@gmail.com> wrote:

> Thanks for the summary Danny. +1 to the new proposal.
>
> I have a minor concern about the global configuration
> `table.optimizer.dynamic-table-options.enabled`, does it belong to
> optimizer?
> From my point of view, it is just an API to set table options and uses
> Calcite in the implementation.
> I'm also thinking about what's the name of other configurations, e.g
> time-zone, code-gen length, state ttl.
> Should they prefix with "optimizer" or "exec" or something else or nothing?
>
> Best,
> Jark
>
> On Tue, 7 Apr 2020 at 23:17, Timo Walther <twal...@apache.org> wrote:
>
> > Thanks for the update Danny. +1 from my side.
> >
> > Regards,
> > Timo
> >
> >
> > On 07.04.20 13:25, Danny Chan wrote:
> > > Hi, every ~
> > >
> > > It seems that we all agree to drop the idea for white/black list for
> each
> > > connector, and have a global config option to default disable this
> > feature.
> > >
> > > I have also discussed with Timo and Jark about the interface
> > > TableSourceTable.Context.getExecutionOptions and finally we decide to
> > > introduce a new interface CatalogTable#copy(Map<String, String>) to
> > support
> > > re-generate the table with new table options.
> > >
> > > So let me summarize the current design broadly again:
> > >
> > >     - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe
> > the
> > >     dynamic table options
> > >     - There is no constraint on which option key can be used in the
> > OPTIONS,
> > >     that means, any option key is allowed, the factory would to the
> > validation
> > >     work finally
> > >     - Introduce method CatalogTable#copy, we use this method to
> > regenerate a
> > >     new CatalogTable to find a table factory and creates table
> > source/sink
> > >     - There is a global config option to default disable this feature
> (if
> > >     user uses OPTIONS, an exception throws to tell open the option)
> > >
> > > I have updated the WIKI
> > > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL
> > >,
> > > look forward to your suggestions ~
> > >
> > > Jark Wu <imj...@gmail.com> 于2020年4月7日周二 上午11:24写道:
> > >
> > >> I'm fine to disable this feature by default and avoid
> > >> whitelisting/blacklisting. This simplifies a lot of things.
> > >>
> > >> Regarding to TableSourceFactory#Context#getExecutionOptions, do we
> > really
> > >> need this interface?
> > >> Should the connector factory be aware of the properties is merged with
> > >> hints or not?
> > >> What's the problem if we always get properties from
> > >> `CatalogTable#getProperties`?
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Tue, 7 Apr 2020 at 10:39, Kurt Young <ykt...@gmail.com> wrote:
> > >>
> > >>> Sounds like a reasonable compromise, disabling this feature by
> default
> > >> is a
> > >>> way to protect
> > >>> the vulnerability, and we can simplify the design quite a lot. We can
> > >>> gather some users'
> > >>> feedback to see whether further protections are necessary in the
> > future.
> > >>>
> > >>> Best,
> > >>> Kurt
> > >>>
> > >>>
> > >>> On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <twal...@apache.org>
> > wrote:
> > >>>
> > >>>> I agree with Aljoscha. The length of this thread shows that this is
> > >>>> highly controversal. I think nobody really likes this feature 100%
> but
> > >>>> we could not find a better solution. I would consider it as a
> > >>>> nice-to-have improvement during a notebook/debugging session.
> > >>>>
> > >>>> I would accept avoiding whitelisting/blacklisting if the feature is
> > >>>> disabled by default. And we make the merged properties available in
> a
> > >>>> separate TableSourceFactory#Context#getExecutionOptions as Danny
> > >>> proposed.
> > >>>>
> > >>>> What do you think?
> > >>>>
> > >>>> Thanks,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>> On 06.04.20 09:59, Aljoscha Krettek wrote:
> > >>>>> The reason I'm saying it should be disabled by default is that this
> > >>> uses
> > >>>>> hint syntax, and hints should really not change query semantics.
> > >>>>>
> > >>>>> I'm quite strongly against hints that change query semantics, but
> if
> > >> we
> > >>>>> disable this by default I would be (reluctantly) OK with the
> feature.
> > >>>>> Companies that create deployments or set up the SQL environment for
> > >>>>> users can enable the feature if they want.
> > >>>>>
> > >>>>> But yes, I also agree that we don't need whitelisting/blacklisting,
> > >>>>> which makes this a lot easier to do.
> > >>>>>
> > >>>>> Best,
> > >>>>> Aljoscha
> > >>>>>
> > >>>>> On 06.04.20 04:27, Danny Chan wrote:
> > >>>>>> Hi, everyone ~
> > >>>>>>
> > >>>>>> @Aljoscha @Timo
> > >>>>>>
> > >>>>>>> I think we're designing ourselves into ever more complicated
> > >> corners
> > >>>>>> here
> > >>>>>>
> > >>>>>> I kindly agree that, personally didn't see strong reasons why we
> > >>>>>> should limit on each connector properties:
> > >>>>>>
> > >>>>>> • we can define any table options for CREATE TABLE, why we treat
> the
> > >>>>>> dynamic options differently, we never consider any security
> problems
> > >>>>>> when create table, we should not either for dynamic table options
> > >>>>>> • If we do not have whitelist properties or blacklist properties,
> > >> the
> > >>>>>> table source creation work would be much easier, just used the
> > >> merged
> > >>>>>> options. There is no need to modify each connector to decide which
> > >>>>>> options could be overridden and how we merge them(the merge work
> is
> > >>>>>> redundant).
> > >>>>>> • @Timo, how about we support another interface
> > >>>>>> `TableSourceFactory#Context.getExecutionOptions`, we always use
> this
> > >>>>>> interface to get the options to create our table source. There is
> no
> > >>>>>> need to copy the catalog table itselt, we just need to generate
> our
> > >>>>>> Context correctly.
> > >>>>>> • @Aljoscha I agree to have a global config option, but I disagree
> > >> to
> > >>>>>> default disable it, a global default config would break the user
> > >>>>>> experience too much, especially when user want to modify the
> options
> > >>>>>> in a ad-hoc way.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> > >>>>>> `TableSourceFactory#forbiddenHintOptions` based on the fact that
> we
> > >>>>>> does not have black/white list for CREATE TABLE at all at lease
> for
> > >>>>>> current codebase.
> > >>>>>>
> > >>>>>>
> > >>>>>> @Timo (i have replied offline but allows to represent it here
> again)
> > >>>>>>
> > >>>>>> The `TableSourceFactory#supportedHintOptions` doesn't work well
> for
> > >> 3
> > >>>>>> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> > >>>>>> 1. For key with wildcard, like connector.property.* , use a
> > >> blacklist
> > >>>>>> make us have the ability to disable some of the keys under that,
> > >> i.e.
> > >>>>>> connector.property.key1 , a whitelist can only match with prefix
> > >>>>>>
> > >>>>>> 2. We want the connectors to have the ability to disable format
> type
> > >>>>>> switch format.type but allows all the other properties, e.g.
> > >> format.*
> > >>>>>> without format.type(let's call it SET_B), if we use the whitelist,
> > >> we
> > >>>>>> have to enumerate all the specific format keys start with format
> > >>>>>> (SET_B), but with the old connector factories, we have no idea
> what
> > >>>>>> specific format keys it supports(there is either a format.* or
> > >>> nothing).
> > >>>>>>
> > >>>>>> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> > >>>>>> blacklist and whitelist has the same expressiveness, use blacklist
> > >>>>>> makes the code not too verbose to enumerate all the duplicate keys
> > >>>>>> with #supportedKeys .(Not very strong reason, but i think as a
> > >>>>>> connector developer, it makes sense)
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Danny Chan
> > >>>>>> 在 2020年4月3日 +0800 PM11:28,Timo Walther <twal...@apache.org>,写道:
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> @Aljoscha: I disagree with your approach because a `Catalog` can
> > >>> return
> > >>>>>>> a custom factory that is not using any properties. The hinting
> must
> > >>> be
> > >>>>>>> transparent to a factory. We should NOT modify the metadata
> > >>>>>>> `CatalogTable` at any point in time after the catalog.
> > >>>>>>>
> > >>>>>>> @Danny, @Jingsong: How about we stick to the original design that
> > >> we
> > >>>>>>> wanted to vote on but use:
> > >>>>>>>
> > >>>>>>> Set<String> supportedHintProperties()
> > >>>>>>>
> > >>>>>>> This fits better to the old factory design. And for the new
> FLIP-95
> > >>>>>>> factories we will use `ConfigOption` and provide good utilities
> for
> > >>>>>>> merging with hints etc.
> > >>>>>>>
> > >>>>>>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> > >>>>>>> hinting in formats.
> > >>>>>>>
> > >>>>>>> What do you think?
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Timo
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> > >>>>>>>> I think we're designing ourselves into ever more complicated
> > >> corners
> > >>>>>>>> here. Maybe we need to take a step back and reconsider. What
> would
> > >>> you
> > >>>>>>>> think about this (somewhat) simpler proposal:
> > >>>>>>>>
> > >>>>>>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> > >>>>>>>> CONNECTOR_PROPERTIES, depending on what naming we want to have
> for
> > >>>> this
> > >>>>>>>> in the future. This will simply overwrite all connector
> > >> properties,
> > >>>> the
> > >>>>>>>> table factories don't know about hints but simply work with the
> > >>>>>>>> properties that they are given
> > >>>>>>>>
> > >>>>>>>> - this special hint is disabled by default and can be activated
> > >>> with a
> > >>>>>>>> global option "foo.bazzle.connector-hints" (or something like
> > >> this)
> > >>>>>>>> which has a warning that describes that this can change query
> > >>>> semantics
> > >>>>>>>> etc.
> > >>>>>>>>
> > >>>>>>>> That's it. This makes connector implementations a lot easier
> while
> > >>>>>>>> still
> > >>>>>>>> allowing inline configuration.
> > >>>>>>>>
> > >>>>>>>> I still don't like using hint syntax at all for this, because I
> > >>>>>>>> strongly
> > >>>>>>>> maintain that hints should not change query syntax. In general
> > >> using
> > >>>>>>>> hints should be kept to a minimum because they usually point to
> > >>>>>>>> shortcomings in the system.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Aljoscha
> > >>>>>>>>
> > >>>>>>>> On 02.04.20 06:06, Jingsong Li wrote:
> > >>>>>>>>> Hi Dawid,
> > >>>>>>>>>
> > >>>>>>>>>> When a factory is instantiated it has access to the
> > >> CatalogTable,
> > >>>>>>>>> therefore it has access to all the original properties. In turn
> > >> it
> > >>>>>>>>> knows
> > >>>>>>>>> the original format and can call
> > >>>> FormatFactory#supportedHintOptions().
> > >>>>>>>>>
> > >>>>>>>>> Factory can only get CatalogTable when creating source or sink,
> > >>>>>>>>> right? IIUC, TableFactory may be stateless too.
> > >>>>>>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> > >>>>>>>>> get CatalogTable, so it is impossible to create FormatFactory?
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Jingsong Lee
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <
> yuzhao....@gmail.com
> > >>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> > >>>>>>>>>>
> > >>>>>>>>>> 1. Regarding the motivation:
> > >>>>>>>>>>
> > >>>>>>>>>> I agree it's not a good suggested way based on the fact that
> we
> > >>> have
> > >>>>>>>>>> better solution, but i think we can support override that as
> > >> long
> > >>>>>>>>>> as it
> > >>>>>>>>>> exists as one of the the table options. I would remove if from
> > >> the
> > >>>>>>>>>> motication part.
> > >>>>>>>>>>
> > >>>>>>>>>> 2. The options passes around during sql-to-rel conversion,
> right
> > >>>>>>>>>> after we
> > >>>>>>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink),
> > >> this
> > >>>> is
> > >>>>>>>>>> indeed a push way method at least in the RelOptTable layer,
> then
> > >>>>>>>>>> we hand
> > >>>>>>>>>> over the options to TableSourceFactory with our own context,
> > >> which
> > >>>> is
> > >>>>>>>>>> fine
> > >>>>>>>>>> becuase TableSourceFactory#Context is the contact to pass
> around
> > >>>>>>>>>> these
> > >>>>>>>>>> table-about variables.
> > >>>>>>>>>>
> > >>>>>>>>>> 3. "We should not end up with an extreme example where we can
> > >>>>>>>>>> change the
> > >>>>>>>>>> connector type", i totally agree that, and i have listed the
> > >>>>>>>>>> "connector.type" as forbidden attribute in the WIKI. As for
> the
> > >>>>>>>>>> format, i
> > >>>>>>>>>> think the connector itself can/should control whether to
> > >> override
> > >>>> the
> > >>>>>>>>>> "format.type", that is one of the reason i change the
> > >>>>>>>>>> TableSourceFactory#supportedHintOpitons to
> > >>>>>>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we
> can
> > >>>>>>>>>> limit the
> > >>>>>>>>>> format keys we want conveniently.
> > >>>>>>>>>>
> > >>>>>>>>>> 4. SQL Hints syntax.
> > >>>>>>>>>>
> > >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> > >> (not
> > >>>>>>>>>>> sure
> > >>>>>>>>>> if it is equivalent to string_literal).
> > >>>>>>>>>>
> > >>>>>>>>>> I disagree, we at least should keep sync with our DDL: use the
> > >>>> string
> > >>>>>>>>>> literal as the key. We did also support the simple identifier
> > >>>> because
> > >>>>>>>>>> this
> > >>>>>>>>>> is the common hint syntax from Calcite, it does not hurt
> > >> anything
> > >>>> for
> > >>>>>>>>>> the
> > >>>>>>>>>> OPTIONS hint, the unsupported keys would validate fails.(If
> you
> > >>>> think
> > >>>>>>>>>> that
> > >>>>>>>>>> may cause some confuse, i can make the syntax pluggable for
> each
> > >>>>>>>>>> hint in
> > >>>>>>>>>> CALCITE 1.23)
> > >>>>>>>>>>
> > >>>>>>>>>> We only supports OPTIONS hint in the FLIP, and i have changed
> > >> the
> > >>>>>>>>>> title to
> > >>>>>>>>>> "Supports dynamic table options", would make it more clear in
> > >> the
> > >>>>>>>>>> WIKI.
> > >>>>>>>>>>
> > >>>>>>>>>> 5. Yes, we also have this concerns from our offline
> discussion,
> > >>>>>>>>>> that is
> > >>>>>>>>>> one of the reason, why i change the
> > >>>>>>>>>> TableSourceFactory#supportedHintOpitons
> > >>>>>>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> > >>>>>>>>>> Set<String>
> > >>>>>>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> > >>>>>>>>>>
> > >>>>>>>>>>      - The wildcard is still not descriptive, we can still not
> > >>>>>>>>>> forbidden one
> > >>>>>>>>>> of the properties among the wildcard properties, we can not
> > >> enable
> > >>>> or
> > >>>>>>>>>> disable them totally
> > >>>>>>>>>>      - ConfigOption is our new structure for keys, and it does
> > >> not
> > >>>>>>>>>> support
> > >>>>>>>>>> wildcard yet.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Danny Chan
> > >>>>>>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> > >>>>>>>>>> <dwysakow...@apache.org>,写道:
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>> Few comments from my side:
> > >>>>>>>>>>> 1. Regarding the motivation:
> > >>>>>>>>>>> I think the example with changing the update-mode is not a
> good
> > >>>>>>>>>>> one. In
> > >>>>>>>>>> the long term this should be done with EMIT CHANGELOG
> (discussed
> > >>> in
> > >>>>>>>>>> FLIP-105).
> > >>>>>>>>>>> Nitpicking: I would mention that it is rather for
> > >>> debugging/ad-hoc
> > >>>>>>>>>> solution. I think this should not be a recommended way for
> > >>>> production
> > >>>>>>>>>> use
> > >>>>>>>>>> cases as it bypasses the Catalog, which should be the source
> of
> > >>>>>>>>>> truth.
> > >>>>>>>>>>> 2. I could not understand how the additional options will be
> > >>>>>>>>>>> passed to
> > >>>>>>>>>> the TableSourceFactory. Could you elaborate a bit more on
> that?
> > >> I
> > >>>> see
> > >>>>>>>>>> there
> > >>>>>>>>>> is a Context interface that gives the options. But cannot
> find a
> > >>> way
> > >>>>>>>>>> to get
> > >>>>>>>>>> the context itself in the factory. Moreover I think it would
> > >> make
> > >>>>>>>>>> more
> > >>>>>>>>>> sense to have rather a push based approach here. Something
> like
> > >>>>>>>>>> applyOptions(ReadableConfig) method.
> > >>>>>>>>>>> 3. As for the concerns Jingsong raised in the voting thread.
> I
> > >>>>>>>>>>> think it
> > >>>>>>>>>> is not a big problem, but I agree this should be also
> > >> described. I
> > >>>>>>>>>> disagree
> > >>>>>>>>>> with "Connector don't know format information in TableFactory
> > >>> before
> > >>>>>>>>>> obtains real properties, so it can not list any format
> > >>>>>>>>>> `supportedHintOptions`".
> > >>>>>>>>>>> When a factory is instantiated it has access to the
> > >> CatalogTable,
> > >>>>>>>>>> therefore it has access to all the original properties. In
> turn
> > >> it
> > >>>>>>>>>> knows
> > >>>>>>>>>> the original format and can call
> > >>>>>>>>>> FormatFactory#supportedHintOptions().
> > >>>>>>>>>>> The only case when this would not work would be if we allow
> > >>>> changing
> > >>>>>>>>>>> the
> > >>>>>>>>>> format of the Table (e.g. from avro to parquet), which does
> not
> > >>>> sound
> > >>>>>>>>>> like
> > >>>>>>>>>> a good idea to me. I think this feature should not end up as a
> > >> way
> > >>>> to
> > >>>>>>>>>> declare a whole table inline in a SQL query, but should rather
> > >> be
> > >>> a
> > >>>>>>>>>> simple
> > >>>>>>>>>> way for debugging queries. We should not end up with an
> extreme
> > >>>>>>>>>> example
> > >>>>>>>>>> where we do:
> > >>>>>>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka',
> ...,
> > >>>>>>>>>> 'format.type' = 'json', ....) */
> > >>>>>>>>>>> 4. SQL Hints syntax.
> > >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> > >> (not
> > >>>>>>>>>>> sure
> > >>>>>>>>>> if it is equivalent to string_literal). I think we should not
> > >> use
> > >>>>>>>>>> simple_identifier because this implies that we cannot use e.g.
> > >> any
> > >>>>>>>>>> SQL
> > >>>>>>>>>> keywords. Anyway it has nothing to do with identifiers. If I
> am
> > >>> not
> > >>>>>>>>>> mistaken it is also how the options in the CREATE statement
> are
> > >>>>>>>>>> implemented.
> > >>>>>>>>>>> What is the purpose of the remaining hint_item:
> > >>> hint_name(hint_opt
> > >>>> [
> > >>>>>>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I
> got a
> > >>>>>>>>>> feeling it
> > >>>>>>>>>> does also suggests to support the whole Apache Calcite hint
> > >> system
> > >>>>>>>>>> without
> > >>>>>>>>>> specifying that explicitly. Is the intention of the FLIP to
> > >>> support
> > >>>>>>>>>> choosing e.g. JOIN strategies through hints already? If it is
> so
> > >>> it
> > >>>>>>>>>> should
> > >>>>>>>>>> be mentioned in the FLIP, imo.
> > >>>>>>>>>>> 5. I think something does not work around the
> > >>>>>>>>>>> supportedHintOptions and
> > >>>>>>>>>> wildcards. How do you want to represent a wildcard key as a
> > >>>>>>>>>> ConfigOption? I
> > >>>>>>>>>> am not sure about that, just a though, maybe it make sense to
> > >> have
> > >>>>>>>>>> rather
> > >>>>>>>>>> Set<String> supportedHintOptionKeys()?
> > >>>>>>>>>>> Best,
> > >>>>>>>>>>> Dawid
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>

Reply via email to