Sounds like a reasonable compromise, disabling this feature by default is a
way to protect
the vulnerability, and we can simplify the design quite a lot. We can
gather some users'
feedback to see whether further protections are necessary in the future.

Best,
Kurt


On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <twal...@apache.org> wrote:

> I agree with Aljoscha. The length of this thread shows that this is
> highly controversal. I think nobody really likes this feature 100% but
> we could not find a better solution. I would consider it as a
> nice-to-have improvement during a notebook/debugging session.
>
> I would accept avoiding whitelisting/blacklisting if the feature is
> disabled by default. And we make the merged properties available in a
> separate TableSourceFactory#Context#getExecutionOptions as Danny proposed.
>
> What do you think?
>
> Thanks,
> Timo
>
>
> On 06.04.20 09:59, Aljoscha Krettek wrote:
> > The reason I'm saying it should be disabled by default is that this uses
> > hint syntax, and hints should really not change query semantics.
> >
> > I'm quite strongly against hints that change query semantics, but if we
> > disable this by default I would be (reluctantly) OK with the feature.
> > Companies that create deployments or set up the SQL environment for
> > users can enable the feature if they want.
> >
> > But yes, I also agree that we don't need whitelisting/blacklisting,
> > which makes this a lot easier to do.
> >
> > Best,
> > Aljoscha
> >
> > On 06.04.20 04:27, Danny Chan wrote:
> >> Hi, everyone ~
> >>
> >> @Aljoscha @Timo
> >>
> >>> I think we're designing ourselves into ever more complicated corners
> >> here
> >>
> >> I kindly agree that, personally didn't see strong reasons why we
> >> should limit on each connector properties:
> >>
> >> • we can define any table options for CREATE TABLE, why we treat the
> >> dynamic options differently, we never consider any security problems
> >> when create table, we should not either for dynamic table options
> >> • If we do not have whitelist properties or blacklist properties, the
> >> table source creation work would be much easier, just used the merged
> >> options. There is no need to modify each connector to decide which
> >> options could be overridden and how we merge them(the merge work is
> >> redundant).
> >> • @Timo, how about we support another interface
> >> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> >> interface to get the options to create our table source. There is no
> >> need to copy the catalog table itselt, we just need to generate our
> >> Context correctly.
> >> • @Aljoscha I agree to have a global config option, but I disagree to
> >> default disable it, a global default config would break the user
> >> experience too much, especially when user want to modify the options
> >> in a ad-hoc way.
> >>
> >>
> >>
> >> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> >> does not have black/white list for CREATE TABLE at all at lease for
> >> current codebase.
> >>
> >>
> >> @Timo (i have replied offline but allows to represent it here again)
> >>
> >> The `TableSourceFactory#supportedHintOptions` doesn't work well for 3
> >> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> >> 1. For key with wildcard, like connector.property.* , use a blacklist
> >> make us have the ability to disable some of the keys under that, i.e.
> >> connector.property.key1 , a whitelist can only match with prefix
> >>
> >> 2. We want the connectors to have the ability to disable format type
> >> switch format.type but allows all the other properties, e.g. format.*
> >> without format.type(let's call it SET_B), if we use the whitelist, we
> >> have to enumerate all the specific format keys start with format
> >> (SET_B), but with the old connector factories, we have no idea what
> >> specific format keys it supports(there is either a format.* or nothing).
> >>
> >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> >> blacklist and whitelist has the same expressiveness, use blacklist
> >> makes the code not too verbose to enumerate all the duplicate keys
> >> with #supportedKeys .(Not very strong reason, but i think as a
> >> connector developer, it makes sense)
> >>
> >> Best,
> >> Danny Chan
> >> 在 2020年4月3日 +0800 PM11:28,Timo Walther <twal...@apache.org>,写道:
> >>> Hi everyone,
> >>>
> >>> @Aljoscha: I disagree with your approach because a `Catalog` can return
> >>> a custom factory that is not using any properties. The hinting must be
> >>> transparent to a factory. We should NOT modify the metadata
> >>> `CatalogTable` at any point in time after the catalog.
> >>>
> >>> @Danny, @Jingsong: How about we stick to the original design that we
> >>> wanted to vote on but use:
> >>>
> >>> Set<String> supportedHintProperties()
> >>>
> >>> This fits better to the old factory design. And for the new FLIP-95
> >>> factories we will use `ConfigOption` and provide good utilities for
> >>> merging with hints etc.
> >>>
> >>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> >>> hinting in formats.
> >>>
> >>> What do you think?
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> >>>> I think we're designing ourselves into ever more complicated corners
> >>>> here. Maybe we need to take a step back and reconsider. What would you
> >>>> think about this (somewhat) simpler proposal:
> >>>>
> >>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> >>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for
> this
> >>>> in the future. This will simply overwrite all connector properties,
> the
> >>>> table factories don't know about hints but simply work with the
> >>>> properties that they are given
> >>>>
> >>>> - this special hint is disabled by default and can be activated with a
> >>>> global option "foo.bazzle.connector-hints" (or something like this)
> >>>> which has a warning that describes that this can change query
> semantics
> >>>> etc.
> >>>>
> >>>> That's it. This makes connector implementations a lot easier while
> >>>> still
> >>>> allowing inline configuration.
> >>>>
> >>>> I still don't like using hint syntax at all for this, because I
> >>>> strongly
> >>>> maintain that hints should not change query syntax. In general using
> >>>> hints should be kept to a minimum because they usually point to
> >>>> shortcomings in the system.
> >>>>
> >>>> Best,
> >>>> Aljoscha
> >>>>
> >>>> On 02.04.20 06:06, Jingsong Li wrote:
> >>>>> Hi Dawid,
> >>>>>
> >>>>>> When a factory is instantiated it has access to the CatalogTable,
> >>>>> therefore it has access to all the original properties. In turn it
> >>>>> knows
> >>>>> the original format and can call
> FormatFactory#supportedHintOptions().
> >>>>>
> >>>>> Factory can only get CatalogTable when creating source or sink,
> >>>>> right? IIUC, TableFactory may be stateless too.
> >>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> >>>>> get CatalogTable, so it is impossible to create FormatFactory?
> >>>>>
> >>>>> Best,
> >>>>> Jingsong Lee
> >>>>>
> >>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao....@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> >>>>>>
> >>>>>> 1. Regarding the motivation:
> >>>>>>
> >>>>>> I agree it's not a good suggested way based on the fact that we have
> >>>>>> better solution, but i think we can support override that as long
> >>>>>> as it
> >>>>>> exists as one of the the table options. I would remove if from the
> >>>>>> motication part.
> >>>>>>
> >>>>>> 2. The options passes around during sql-to-rel conversion, right
> >>>>>> after we
> >>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this
> is
> >>>>>> indeed a push way method at least in the RelOptTable layer, then
> >>>>>> we hand
> >>>>>> over the options to TableSourceFactory with our own context, which
> is
> >>>>>> fine
> >>>>>> becuase TableSourceFactory#Context is the contact to pass around
> >>>>>> these
> >>>>>> table-about variables.
> >>>>>>
> >>>>>> 3. "We should not end up with an extreme example where we can
> >>>>>> change the
> >>>>>> connector type", i totally agree that, and i have listed the
> >>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
> >>>>>> format, i
> >>>>>> think the connector itself can/should control whether to override
> the
> >>>>>> "format.type", that is one of the reason i change the
> >>>>>> TableSourceFactory#supportedHintOpitons to
> >>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> >>>>>> limit the
> >>>>>> format keys we want conveniently.
> >>>>>>
> >>>>>> 4. SQL Hints syntax.
> >>>>>>
> >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not
> >>>>>>> sure
> >>>>>> if it is equivalent to string_literal).
> >>>>>>
> >>>>>> I disagree, we at least should keep sync with our DDL: use the
> string
> >>>>>> literal as the key. We did also support the simple identifier
> because
> >>>>>> this
> >>>>>> is the common hint syntax from Calcite, it does not hurt anything
> for
> >>>>>> the
> >>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you
> think
> >>>>>> that
> >>>>>> may cause some confuse, i can make the syntax pluggable for each
> >>>>>> hint in
> >>>>>> CALCITE 1.23)
> >>>>>>
> >>>>>> We only supports OPTIONS hint in the FLIP, and i have changed the
> >>>>>> title to
> >>>>>> "Supports dynamic table options", would make it more clear in the
> >>>>>> WIKI.
> >>>>>>
> >>>>>> 5. Yes, we also have this concerns from our offline discussion,
> >>>>>> that is
> >>>>>> one of the reason, why i change the
> >>>>>> TableSourceFactory#supportedHintOpitons
> >>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> >>>>>> Set<String>
> >>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> >>>>>>
> >>>>>>     - The wildcard is still not descriptive, we can still not
> >>>>>> forbidden one
> >>>>>> of the properties among the wildcard properties, we can not enable
> or
> >>>>>> disable them totally
> >>>>>>     - ConfigOption is our new structure for keys, and it does not
> >>>>>> support
> >>>>>> wildcard yet.
> >>>>>>
> >>>>>> Best,
> >>>>>> Danny Chan
> >>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> >>>>>> <dwysakow...@apache.org>,写道:
> >>>>>>> Hi,
> >>>>>>> Few comments from my side:
> >>>>>>> 1. Regarding the motivation:
> >>>>>>> I think the example with changing the update-mode is not a good
> >>>>>>> one. In
> >>>>>> the long term this should be done with EMIT CHANGELOG (discussed in
> >>>>>> FLIP-105).
> >>>>>>> Nitpicking: I would mention that it is rather for debugging/ad-hoc
> >>>>>> solution. I think this should not be a recommended way for
> production
> >>>>>> use
> >>>>>> cases as it bypasses the Catalog, which should be the source of
> >>>>>> truth.
> >>>>>>> 2. I could not understand how the additional options will be
> >>>>>>> passed to
> >>>>>> the TableSourceFactory. Could you elaborate a bit more on that? I
> see
> >>>>>> there
> >>>>>> is a Context interface that gives the options. But cannot find a way
> >>>>>> to get
> >>>>>> the context itself in the factory. Moreover I think it would make
> >>>>>> more
> >>>>>> sense to have rather a push based approach here. Something like
> >>>>>> applyOptions(ReadableConfig) method.
> >>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I
> >>>>>>> think it
> >>>>>> is not a big problem, but I agree this should be also described. I
> >>>>>> disagree
> >>>>>> with "Connector don't know format information in TableFactory before
> >>>>>> obtains real properties, so it can not list any format
> >>>>>> `supportedHintOptions`".
> >>>>>>> When a factory is instantiated it has access to the CatalogTable,
> >>>>>> therefore it has access to all the original properties. In turn it
> >>>>>> knows
> >>>>>> the original format and can call
> >>>>>> FormatFactory#supportedHintOptions().
> >>>>>>> The only case when this would not work would be if we allow
> changing
> >>>>>>> the
> >>>>>> format of the Table (e.g. from avro to parquet), which does not
> sound
> >>>>>> like
> >>>>>> a good idea to me. I think this feature should not end up as a way
> to
> >>>>>> declare a whole table inline in a SQL query, but should rather be a
> >>>>>> simple
> >>>>>> way for debugging queries. We should not end up with an extreme
> >>>>>> example
> >>>>>> where we do:
> >>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> >>>>>> 'format.type' = 'json', ....) */
> >>>>>>> 4. SQL Hints syntax.
> >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not
> >>>>>>> sure
> >>>>>> if it is equivalent to string_literal). I think we should not use
> >>>>>> simple_identifier because this implies that we cannot use e.g. any
> >>>>>> SQL
> >>>>>> keywords. Anyway it has nothing to do with identifiers. If I am not
> >>>>>> mistaken it is also how the options in the CREATE statement are
> >>>>>> implemented.
> >>>>>>> What is the purpose of the remaining hint_item: hint_name(hint_opt
> [
> >>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> >>>>>> feeling it
> >>>>>> does also suggests to support the whole Apache Calcite hint system
> >>>>>> without
> >>>>>> specifying that explicitly. Is the intention of the FLIP to support
> >>>>>> choosing e.g. JOIN strategies through hints already? If it is so it
> >>>>>> should
> >>>>>> be mentioned in the FLIP, imo.
> >>>>>>> 5. I think something does not work around the
> >>>>>>> supportedHintOptions and
> >>>>>> wildcards. How do you want to represent a wildcard key as a
> >>>>>> ConfigOption? I
> >>>>>> am not sure about that, just a though, maybe it make sense to have
> >>>>>> rather
> >>>>>> Set<String> supportedHintOptionKeys()?
> >>>>>>> Best,
> >>>>>>> Dawid
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
>
>

Reply via email to