Hi, every ~ It seems that we all agree to drop the idea for white/black list for each connector, and have a global config option to default disable this feature.
I have also discussed with Timo and Jark about the interface TableSourceTable.Context.getExecutionOptions and finally we decide to introduce a new interface CatalogTable#copy(Map<String, String>) to support re-generate the table with new table options. So let me summarize the current design broadly again: - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe the dynamic table options - There is no constraint on which option key can be used in the OPTIONS, that means, any option key is allowed, the factory would to the validation work finally - Introduce method CatalogTable#copy, we use this method to regenerate a new CatalogTable to find a table factory and creates table source/sink - There is a global config option to default disable this feature (if user uses OPTIONS, an exception throws to tell open the option) I have updated the WIKI <https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL>, look forward to your suggestions ~ Jark Wu <imj...@gmail.com> 于2020年4月7日周二 上午11:24写道: > I'm fine to disable this feature by default and avoid > whitelisting/blacklisting. This simplifies a lot of things. > > Regarding to TableSourceFactory#Context#getExecutionOptions, do we really > need this interface? > Should the connector factory be aware of the properties is merged with > hints or not? > What's the problem if we always get properties from > `CatalogTable#getProperties`? > > Best, > Jark > > On Tue, 7 Apr 2020 at 10:39, Kurt Young <ykt...@gmail.com> wrote: > > > Sounds like a reasonable compromise, disabling this feature by default > is a > > way to protect > > the vulnerability, and we can simplify the design quite a lot. We can > > gather some users' > > feedback to see whether further protections are necessary in the future. > > > > Best, > > Kurt > > > > > > On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <twal...@apache.org> wrote: > > > > > I agree with Aljoscha. The length of this thread shows that this is > > > highly controversal. I think nobody really likes this feature 100% but > > > we could not find a better solution. I would consider it as a > > > nice-to-have improvement during a notebook/debugging session. > > > > > > I would accept avoiding whitelisting/blacklisting if the feature is > > > disabled by default. And we make the merged properties available in a > > > separate TableSourceFactory#Context#getExecutionOptions as Danny > > proposed. > > > > > > What do you think? > > > > > > Thanks, > > > Timo > > > > > > > > > On 06.04.20 09:59, Aljoscha Krettek wrote: > > > > The reason I'm saying it should be disabled by default is that this > > uses > > > > hint syntax, and hints should really not change query semantics. > > > > > > > > I'm quite strongly against hints that change query semantics, but if > we > > > > disable this by default I would be (reluctantly) OK with the feature. > > > > Companies that create deployments or set up the SQL environment for > > > > users can enable the feature if they want. > > > > > > > > But yes, I also agree that we don't need whitelisting/blacklisting, > > > > which makes this a lot easier to do. > > > > > > > > Best, > > > > Aljoscha > > > > > > > > On 06.04.20 04:27, Danny Chan wrote: > > > >> Hi, everyone ~ > > > >> > > > >> @Aljoscha @Timo > > > >> > > > >>> I think we're designing ourselves into ever more complicated > corners > > > >> here > > > >> > > > >> I kindly agree that, personally didn't see strong reasons why we > > > >> should limit on each connector properties: > > > >> > > > >> • we can define any table options for CREATE TABLE, why we treat the > > > >> dynamic options differently, we never consider any security problems > > > >> when create table, we should not either for dynamic table options > > > >> • If we do not have whitelist properties or blacklist properties, > the > > > >> table source creation work would be much easier, just used the > merged > > > >> options. There is no need to modify each connector to decide which > > > >> options could be overridden and how we merge them(the merge work is > > > >> redundant). > > > >> • @Timo, how about we support another interface > > > >> `TableSourceFactory#Context.getExecutionOptions`, we always use this > > > >> interface to get the options to create our table source. There is no > > > >> need to copy the catalog table itselt, we just need to generate our > > > >> Context correctly. > > > >> • @Aljoscha I agree to have a global config option, but I disagree > to > > > >> default disable it, a global default config would break the user > > > >> experience too much, especially when user want to modify the options > > > >> in a ad-hoc way. > > > >> > > > >> > > > >> > > > >> I suggest to remove `TableSourceFactory#supportedHintOptions` or > > > >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we > > > >> does not have black/white list for CREATE TABLE at all at lease for > > > >> current codebase. > > > >> > > > >> > > > >> @Timo (i have replied offline but allows to represent it here again) > > > >> > > > >> The `TableSourceFactory#supportedHintOptions` doesn't work well for > 3 > > > >> reasons compared to `TableSourceFactory#forbiddenHintOptions`: > > > >> 1. For key with wildcard, like connector.property.* , use a > blacklist > > > >> make us have the ability to disable some of the keys under that, > i.e. > > > >> connector.property.key1 , a whitelist can only match with prefix > > > >> > > > >> 2. We want the connectors to have the ability to disable format type > > > >> switch format.type but allows all the other properties, e.g. > format.* > > > >> without format.type(let's call it SET_B), if we use the whitelist, > we > > > >> have to enumerate all the specific format keys start with format > > > >> (SET_B), but with the old connector factories, we have no idea what > > > >> specific format keys it supports(there is either a format.* or > > nothing). > > > >> > > > >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the > > > >> blacklist and whitelist has the same expressiveness, use blacklist > > > >> makes the code not too verbose to enumerate all the duplicate keys > > > >> with #supportedKeys .(Not very strong reason, but i think as a > > > >> connector developer, it makes sense) > > > >> > > > >> Best, > > > >> Danny Chan > > > >> 在 2020年4月3日 +0800 PM11:28,Timo Walther <twal...@apache.org>,写道: > > > >>> Hi everyone, > > > >>> > > > >>> @Aljoscha: I disagree with your approach because a `Catalog` can > > return > > > >>> a custom factory that is not using any properties. The hinting must > > be > > > >>> transparent to a factory. We should NOT modify the metadata > > > >>> `CatalogTable` at any point in time after the catalog. > > > >>> > > > >>> @Danny, @Jingsong: How about we stick to the original design that > we > > > >>> wanted to vote on but use: > > > >>> > > > >>> Set<String> supportedHintProperties() > > > >>> > > > >>> This fits better to the old factory design. And for the new FLIP-95 > > > >>> factories we will use `ConfigOption` and provide good utilities for > > > >>> merging with hints etc. > > > >>> > > > >>> We can allow `"format.*"` in `supportedHintProperties()` to allow > > > >>> hinting in formats. > > > >>> > > > >>> What do you think? > > > >>> > > > >>> Regards, > > > >>> Timo > > > >>> > > > >>> > > > >>> On 02.04.20 16:24, Aljoscha Krettek wrote: > > > >>>> I think we're designing ourselves into ever more complicated > corners > > > >>>> here. Maybe we need to take a step back and reconsider. What would > > you > > > >>>> think about this (somewhat) simpler proposal: > > > >>>> > > > >>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or > > > >>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for > > > this > > > >>>> in the future. This will simply overwrite all connector > properties, > > > the > > > >>>> table factories don't know about hints but simply work with the > > > >>>> properties that they are given > > > >>>> > > > >>>> - this special hint is disabled by default and can be activated > > with a > > > >>>> global option "foo.bazzle.connector-hints" (or something like > this) > > > >>>> which has a warning that describes that this can change query > > > semantics > > > >>>> etc. > > > >>>> > > > >>>> That's it. This makes connector implementations a lot easier while > > > >>>> still > > > >>>> allowing inline configuration. > > > >>>> > > > >>>> I still don't like using hint syntax at all for this, because I > > > >>>> strongly > > > >>>> maintain that hints should not change query syntax. In general > using > > > >>>> hints should be kept to a minimum because they usually point to > > > >>>> shortcomings in the system. > > > >>>> > > > >>>> Best, > > > >>>> Aljoscha > > > >>>> > > > >>>> On 02.04.20 06:06, Jingsong Li wrote: > > > >>>>> Hi Dawid, > > > >>>>> > > > >>>>>> When a factory is instantiated it has access to the > CatalogTable, > > > >>>>> therefore it has access to all the original properties. In turn > it > > > >>>>> knows > > > >>>>> the original format and can call > > > FormatFactory#supportedHintOptions(). > > > >>>>> > > > >>>>> Factory can only get CatalogTable when creating source or sink, > > > >>>>> right? IIUC, TableFactory may be stateless too. > > > >>>>> When invoking SourceFactory#supportedHintOptions(), it can not > > > >>>>> get CatalogTable, so it is impossible to create FormatFactory? > > > >>>>> > > > >>>>> Best, > > > >>>>> Jingsong Lee > > > >>>>> > > > >>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao....@gmail.com > > > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hi, Dawid, thanks so much for the detail suggestions ~ > > > >>>>>> > > > >>>>>> 1. Regarding the motivation: > > > >>>>>> > > > >>>>>> I agree it's not a good suggested way based on the fact that we > > have > > > >>>>>> better solution, but i think we can support override that as > long > > > >>>>>> as it > > > >>>>>> exists as one of the the table options. I would remove if from > the > > > >>>>>> motication part. > > > >>>>>> > > > >>>>>> 2. The options passes around during sql-to-rel conversion, right > > > >>>>>> after we > > > >>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), > this > > > is > > > >>>>>> indeed a push way method at least in the RelOptTable layer, then > > > >>>>>> we hand > > > >>>>>> over the options to TableSourceFactory with our own context, > which > > > is > > > >>>>>> fine > > > >>>>>> becuase TableSourceFactory#Context is the contact to pass around > > > >>>>>> these > > > >>>>>> table-about variables. > > > >>>>>> > > > >>>>>> 3. "We should not end up with an extreme example where we can > > > >>>>>> change the > > > >>>>>> connector type", i totally agree that, and i have listed the > > > >>>>>> "connector.type" as forbidden attribute in the WIKI. As for the > > > >>>>>> format, i > > > >>>>>> think the connector itself can/should control whether to > override > > > the > > > >>>>>> "format.type", that is one of the reason i change the > > > >>>>>> TableSourceFactory#supportedHintOpitons to > > > >>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can > > > >>>>>> limit the > > > >>>>>> format keys we want conveniently. > > > >>>>>> > > > >>>>>> 4. SQL Hints syntax. > > > >>>>>> > > > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING > (not > > > >>>>>>> sure > > > >>>>>> if it is equivalent to string_literal). > > > >>>>>> > > > >>>>>> I disagree, we at least should keep sync with our DDL: use the > > > string > > > >>>>>> literal as the key. We did also support the simple identifier > > > because > > > >>>>>> this > > > >>>>>> is the common hint syntax from Calcite, it does not hurt > anything > > > for > > > >>>>>> the > > > >>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you > > > think > > > >>>>>> that > > > >>>>>> may cause some confuse, i can make the syntax pluggable for each > > > >>>>>> hint in > > > >>>>>> CALCITE 1.23) > > > >>>>>> > > > >>>>>> We only supports OPTIONS hint in the FLIP, and i have changed > the > > > >>>>>> title to > > > >>>>>> "Supports dynamic table options", would make it more clear in > the > > > >>>>>> WIKI. > > > >>>>>> > > > >>>>>> 5. Yes, we also have this concerns from our offline discussion, > > > >>>>>> that is > > > >>>>>> one of the reason, why i change the > > > >>>>>> TableSourceFactory#supportedHintOpitons > > > >>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose > > > >>>>>> Set<String> > > > >>>>>> supportedHintOptionKeys() with wildcard for 2 reasons: > > > >>>>>> > > > >>>>>> - The wildcard is still not descriptive, we can still not > > > >>>>>> forbidden one > > > >>>>>> of the properties among the wildcard properties, we can not > enable > > > or > > > >>>>>> disable them totally > > > >>>>>> - ConfigOption is our new structure for keys, and it does > not > > > >>>>>> support > > > >>>>>> wildcard yet. > > > >>>>>> > > > >>>>>> Best, > > > >>>>>> Danny Chan > > > >>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz > > > >>>>>> <dwysakow...@apache.org>,写道: > > > >>>>>>> Hi, > > > >>>>>>> Few comments from my side: > > > >>>>>>> 1. Regarding the motivation: > > > >>>>>>> I think the example with changing the update-mode is not a good > > > >>>>>>> one. In > > > >>>>>> the long term this should be done with EMIT CHANGELOG (discussed > > in > > > >>>>>> FLIP-105). > > > >>>>>>> Nitpicking: I would mention that it is rather for > > debugging/ad-hoc > > > >>>>>> solution. I think this should not be a recommended way for > > > production > > > >>>>>> use > > > >>>>>> cases as it bypasses the Catalog, which should be the source of > > > >>>>>> truth. > > > >>>>>>> 2. I could not understand how the additional options will be > > > >>>>>>> passed to > > > >>>>>> the TableSourceFactory. Could you elaborate a bit more on that? > I > > > see > > > >>>>>> there > > > >>>>>> is a Context interface that gives the options. But cannot find a > > way > > > >>>>>> to get > > > >>>>>> the context itself in the factory. Moreover I think it would > make > > > >>>>>> more > > > >>>>>> sense to have rather a push based approach here. Something like > > > >>>>>> applyOptions(ReadableConfig) method. > > > >>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I > > > >>>>>>> think it > > > >>>>>> is not a big problem, but I agree this should be also > described. I > > > >>>>>> disagree > > > >>>>>> with "Connector don't know format information in TableFactory > > before > > > >>>>>> obtains real properties, so it can not list any format > > > >>>>>> `supportedHintOptions`". > > > >>>>>>> When a factory is instantiated it has access to the > CatalogTable, > > > >>>>>> therefore it has access to all the original properties. In turn > it > > > >>>>>> knows > > > >>>>>> the original format and can call > > > >>>>>> FormatFactory#supportedHintOptions(). > > > >>>>>>> The only case when this would not work would be if we allow > > > changing > > > >>>>>>> the > > > >>>>>> format of the Table (e.g. from avro to parquet), which does not > > > sound > > > >>>>>> like > > > >>>>>> a good idea to me. I think this feature should not end up as a > way > > > to > > > >>>>>> declare a whole table inline in a SQL query, but should rather > be > > a > > > >>>>>> simple > > > >>>>>> way for debugging queries. We should not end up with an extreme > > > >>>>>> example > > > >>>>>> where we do: > > > >>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ..., > > > >>>>>> 'format.type' = 'json', ....) */ > > > >>>>>>> 4. SQL Hints syntax. > > > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING > (not > > > >>>>>>> sure > > > >>>>>> if it is equivalent to string_literal). I think we should not > use > > > >>>>>> simple_identifier because this implies that we cannot use e.g. > any > > > >>>>>> SQL > > > >>>>>> keywords. Anyway it has nothing to do with identifiers. If I am > > not > > > >>>>>> mistaken it is also how the options in the CREATE statement are > > > >>>>>> implemented. > > > >>>>>>> What is the purpose of the remaining hint_item: > > hint_name(hint_opt > > > [ > > > >>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a > > > >>>>>> feeling it > > > >>>>>> does also suggests to support the whole Apache Calcite hint > system > > > >>>>>> without > > > >>>>>> specifying that explicitly. Is the intention of the FLIP to > > support > > > >>>>>> choosing e.g. JOIN strategies through hints already? If it is so > > it > > > >>>>>> should > > > >>>>>> be mentioned in the FLIP, imo. > > > >>>>>>> 5. I think something does not work around the > > > >>>>>>> supportedHintOptions and > > > >>>>>> wildcards. How do you want to represent a wildcard key as a > > > >>>>>> ConfigOption? I > > > >>>>>> am not sure about that, just a though, maybe it make sense to > have > > > >>>>>> rather > > > >>>>>> Set<String> supportedHintOptionKeys()? > > > >>>>>>> Best, > > > >>>>>>> Dawid > > > >>>>>> > > > >>>>> > > > >>>>> > > > >>> > > > >> > > > > > > > > >