+1, LGTM Best, Kurt
On Wed, Apr 8, 2020 at 10:27 AM Jark Wu <imj...@gmail.com> wrote: > Thanks for the summary Danny. +1 to the new proposal. > > I have a minor concern about the global configuration > `table.optimizer.dynamic-table-options.enabled`, does it belong to > optimizer? > From my point of view, it is just an API to set table options and uses > Calcite in the implementation. > I'm also thinking about what's the name of other configurations, e.g > time-zone, code-gen length, state ttl. > Should they prefix with "optimizer" or "exec" or something else or nothing? > > Best, > Jark > > On Tue, 7 Apr 2020 at 23:17, Timo Walther <twal...@apache.org> wrote: > > > Thanks for the update Danny. +1 from my side. > > > > Regards, > > Timo > > > > > > On 07.04.20 13:25, Danny Chan wrote: > > > Hi, every ~ > > > > > > It seems that we all agree to drop the idea for white/black list for > each > > > connector, and have a global config option to default disable this > > feature. > > > > > > I have also discussed with Timo and Jark about the interface > > > TableSourceTable.Context.getExecutionOptions and finally we decide to > > > introduce a new interface CatalogTable#copy(Map<String, String>) to > > support > > > re-generate the table with new table options. > > > > > > So let me summarize the current design broadly again: > > > > > > - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe > > the > > > dynamic table options > > > - There is no constraint on which option key can be used in the > > OPTIONS, > > > that means, any option key is allowed, the factory would to the > > validation > > > work finally > > > - Introduce method CatalogTable#copy, we use this method to > > regenerate a > > > new CatalogTable to find a table factory and creates table > > source/sink > > > - There is a global config option to default disable this feature > (if > > > user uses OPTIONS, an exception throws to tell open the option) > > > > > > I have updated the WIKI > > > < > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL > > >, > > > look forward to your suggestions ~ > > > > > > Jark Wu <imj...@gmail.com> 于2020年4月7日周二 上午11:24写道: > > > > > >> I'm fine to disable this feature by default and avoid > > >> whitelisting/blacklisting. This simplifies a lot of things. > > >> > > >> Regarding to TableSourceFactory#Context#getExecutionOptions, do we > > really > > >> need this interface? > > >> Should the connector factory be aware of the properties is merged with > > >> hints or not? > > >> What's the problem if we always get properties from > > >> `CatalogTable#getProperties`? > > >> > > >> Best, > > >> Jark > > >> > > >> On Tue, 7 Apr 2020 at 10:39, Kurt Young <ykt...@gmail.com> wrote: > > >> > > >>> Sounds like a reasonable compromise, disabling this feature by > default > > >> is a > > >>> way to protect > > >>> the vulnerability, and we can simplify the design quite a lot. We can > > >>> gather some users' > > >>> feedback to see whether further protections are necessary in the > > future. > > >>> > > >>> Best, > > >>> Kurt > > >>> > > >>> > > >>> On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <twal...@apache.org> > > wrote: > > >>> > > >>>> I agree with Aljoscha. The length of this thread shows that this is > > >>>> highly controversal. I think nobody really likes this feature 100% > but > > >>>> we could not find a better solution. I would consider it as a > > >>>> nice-to-have improvement during a notebook/debugging session. > > >>>> > > >>>> I would accept avoiding whitelisting/blacklisting if the feature is > > >>>> disabled by default. And we make the merged properties available in > a > > >>>> separate TableSourceFactory#Context#getExecutionOptions as Danny > > >>> proposed. > > >>>> > > >>>> What do you think? > > >>>> > > >>>> Thanks, > > >>>> Timo > > >>>> > > >>>> > > >>>> On 06.04.20 09:59, Aljoscha Krettek wrote: > > >>>>> The reason I'm saying it should be disabled by default is that this > > >>> uses > > >>>>> hint syntax, and hints should really not change query semantics. > > >>>>> > > >>>>> I'm quite strongly against hints that change query semantics, but > if > > >> we > > >>>>> disable this by default I would be (reluctantly) OK with the > feature. > > >>>>> Companies that create deployments or set up the SQL environment for > > >>>>> users can enable the feature if they want. > > >>>>> > > >>>>> But yes, I also agree that we don't need whitelisting/blacklisting, > > >>>>> which makes this a lot easier to do. > > >>>>> > > >>>>> Best, > > >>>>> Aljoscha > > >>>>> > > >>>>> On 06.04.20 04:27, Danny Chan wrote: > > >>>>>> Hi, everyone ~ > > >>>>>> > > >>>>>> @Aljoscha @Timo > > >>>>>> > > >>>>>>> I think we're designing ourselves into ever more complicated > > >> corners > > >>>>>> here > > >>>>>> > > >>>>>> I kindly agree that, personally didn't see strong reasons why we > > >>>>>> should limit on each connector properties: > > >>>>>> > > >>>>>> • we can define any table options for CREATE TABLE, why we treat > the > > >>>>>> dynamic options differently, we never consider any security > problems > > >>>>>> when create table, we should not either for dynamic table options > > >>>>>> • If we do not have whitelist properties or blacklist properties, > > >> the > > >>>>>> table source creation work would be much easier, just used the > > >> merged > > >>>>>> options. There is no need to modify each connector to decide which > > >>>>>> options could be overridden and how we merge them(the merge work > is > > >>>>>> redundant). > > >>>>>> • @Timo, how about we support another interface > > >>>>>> `TableSourceFactory#Context.getExecutionOptions`, we always use > this > > >>>>>> interface to get the options to create our table source. There is > no > > >>>>>> need to copy the catalog table itselt, we just need to generate > our > > >>>>>> Context correctly. > > >>>>>> • @Aljoscha I agree to have a global config option, but I disagree > > >> to > > >>>>>> default disable it, a global default config would break the user > > >>>>>> experience too much, especially when user want to modify the > options > > >>>>>> in a ad-hoc way. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> I suggest to remove `TableSourceFactory#supportedHintOptions` or > > >>>>>> `TableSourceFactory#forbiddenHintOptions` based on the fact that > we > > >>>>>> does not have black/white list for CREATE TABLE at all at lease > for > > >>>>>> current codebase. > > >>>>>> > > >>>>>> > > >>>>>> @Timo (i have replied offline but allows to represent it here > again) > > >>>>>> > > >>>>>> The `TableSourceFactory#supportedHintOptions` doesn't work well > for > > >> 3 > > >>>>>> reasons compared to `TableSourceFactory#forbiddenHintOptions`: > > >>>>>> 1. For key with wildcard, like connector.property.* , use a > > >> blacklist > > >>>>>> make us have the ability to disable some of the keys under that, > > >> i.e. > > >>>>>> connector.property.key1 , a whitelist can only match with prefix > > >>>>>> > > >>>>>> 2. We want the connectors to have the ability to disable format > type > > >>>>>> switch format.type but allows all the other properties, e.g. > > >> format.* > > >>>>>> without format.type(let's call it SET_B), if we use the whitelist, > > >> we > > >>>>>> have to enumerate all the specific format keys start with format > > >>>>>> (SET_B), but with the old connector factories, we have no idea > what > > >>>>>> specific format keys it supports(there is either a format.* or > > >>> nothing). > > >>>>>> > > >>>>>> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the > > >>>>>> blacklist and whitelist has the same expressiveness, use blacklist > > >>>>>> makes the code not too verbose to enumerate all the duplicate keys > > >>>>>> with #supportedKeys .(Not very strong reason, but i think as a > > >>>>>> connector developer, it makes sense) > > >>>>>> > > >>>>>> Best, > > >>>>>> Danny Chan > > >>>>>> 在 2020年4月3日 +0800 PM11:28,Timo Walther <twal...@apache.org>,写道: > > >>>>>>> Hi everyone, > > >>>>>>> > > >>>>>>> @Aljoscha: I disagree with your approach because a `Catalog` can > > >>> return > > >>>>>>> a custom factory that is not using any properties. The hinting > must > > >>> be > > >>>>>>> transparent to a factory. We should NOT modify the metadata > > >>>>>>> `CatalogTable` at any point in time after the catalog. > > >>>>>>> > > >>>>>>> @Danny, @Jingsong: How about we stick to the original design that > > >> we > > >>>>>>> wanted to vote on but use: > > >>>>>>> > > >>>>>>> Set<String> supportedHintProperties() > > >>>>>>> > > >>>>>>> This fits better to the old factory design. And for the new > FLIP-95 > > >>>>>>> factories we will use `ConfigOption` and provide good utilities > for > > >>>>>>> merging with hints etc. > > >>>>>>> > > >>>>>>> We can allow `"format.*"` in `supportedHintProperties()` to allow > > >>>>>>> hinting in formats. > > >>>>>>> > > >>>>>>> What do you think? > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Timo > > >>>>>>> > > >>>>>>> > > >>>>>>> On 02.04.20 16:24, Aljoscha Krettek wrote: > > >>>>>>>> I think we're designing ourselves into ever more complicated > > >> corners > > >>>>>>>> here. Maybe we need to take a step back and reconsider. What > would > > >>> you > > >>>>>>>> think about this (somewhat) simpler proposal: > > >>>>>>>> > > >>>>>>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or > > >>>>>>>> CONNECTOR_PROPERTIES, depending on what naming we want to have > for > > >>>> this > > >>>>>>>> in the future. This will simply overwrite all connector > > >> properties, > > >>>> the > > >>>>>>>> table factories don't know about hints but simply work with the > > >>>>>>>> properties that they are given > > >>>>>>>> > > >>>>>>>> - this special hint is disabled by default and can be activated > > >>> with a > > >>>>>>>> global option "foo.bazzle.connector-hints" (or something like > > >> this) > > >>>>>>>> which has a warning that describes that this can change query > > >>>> semantics > > >>>>>>>> etc. > > >>>>>>>> > > >>>>>>>> That's it. This makes connector implementations a lot easier > while > > >>>>>>>> still > > >>>>>>>> allowing inline configuration. > > >>>>>>>> > > >>>>>>>> I still don't like using hint syntax at all for this, because I > > >>>>>>>> strongly > > >>>>>>>> maintain that hints should not change query syntax. In general > > >> using > > >>>>>>>> hints should be kept to a minimum because they usually point to > > >>>>>>>> shortcomings in the system. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Aljoscha > > >>>>>>>> > > >>>>>>>> On 02.04.20 06:06, Jingsong Li wrote: > > >>>>>>>>> Hi Dawid, > > >>>>>>>>> > > >>>>>>>>>> When a factory is instantiated it has access to the > > >> CatalogTable, > > >>>>>>>>> therefore it has access to all the original properties. In turn > > >> it > > >>>>>>>>> knows > > >>>>>>>>> the original format and can call > > >>>> FormatFactory#supportedHintOptions(). > > >>>>>>>>> > > >>>>>>>>> Factory can only get CatalogTable when creating source or sink, > > >>>>>>>>> right? IIUC, TableFactory may be stateless too. > > >>>>>>>>> When invoking SourceFactory#supportedHintOptions(), it can not > > >>>>>>>>> get CatalogTable, so it is impossible to create FormatFactory? > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Jingsong Lee > > >>>>>>>>> > > >>>>>>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan < > yuzhao....@gmail.com > > >>> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi, Dawid, thanks so much for the detail suggestions ~ > > >>>>>>>>>> > > >>>>>>>>>> 1. Regarding the motivation: > > >>>>>>>>>> > > >>>>>>>>>> I agree it's not a good suggested way based on the fact that > we > > >>> have > > >>>>>>>>>> better solution, but i think we can support override that as > > >> long > > >>>>>>>>>> as it > > >>>>>>>>>> exists as one of the the table options. I would remove if from > > >> the > > >>>>>>>>>> motication part. > > >>>>>>>>>> > > >>>>>>>>>> 2. The options passes around during sql-to-rel conversion, > right > > >>>>>>>>>> after we > > >>>>>>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), > > >> this > > >>>> is > > >>>>>>>>>> indeed a push way method at least in the RelOptTable layer, > then > > >>>>>>>>>> we hand > > >>>>>>>>>> over the options to TableSourceFactory with our own context, > > >> which > > >>>> is > > >>>>>>>>>> fine > > >>>>>>>>>> becuase TableSourceFactory#Context is the contact to pass > around > > >>>>>>>>>> these > > >>>>>>>>>> table-about variables. > > >>>>>>>>>> > > >>>>>>>>>> 3. "We should not end up with an extreme example where we can > > >>>>>>>>>> change the > > >>>>>>>>>> connector type", i totally agree that, and i have listed the > > >>>>>>>>>> "connector.type" as forbidden attribute in the WIKI. As for > the > > >>>>>>>>>> format, i > > >>>>>>>>>> think the connector itself can/should control whether to > > >> override > > >>>> the > > >>>>>>>>>> "format.type", that is one of the reason i change the > > >>>>>>>>>> TableSourceFactory#supportedHintOpitons to > > >>>>>>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we > can > > >>>>>>>>>> limit the > > >>>>>>>>>> format keys we want conveniently. > > >>>>>>>>>> > > >>>>>>>>>> 4. SQL Hints syntax. > > >>>>>>>>>> > > >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING > > >> (not > > >>>>>>>>>>> sure > > >>>>>>>>>> if it is equivalent to string_literal). > > >>>>>>>>>> > > >>>>>>>>>> I disagree, we at least should keep sync with our DDL: use the > > >>>> string > > >>>>>>>>>> literal as the key. We did also support the simple identifier > > >>>> because > > >>>>>>>>>> this > > >>>>>>>>>> is the common hint syntax from Calcite, it does not hurt > > >> anything > > >>>> for > > >>>>>>>>>> the > > >>>>>>>>>> OPTIONS hint, the unsupported keys would validate fails.(If > you > > >>>> think > > >>>>>>>>>> that > > >>>>>>>>>> may cause some confuse, i can make the syntax pluggable for > each > > >>>>>>>>>> hint in > > >>>>>>>>>> CALCITE 1.23) > > >>>>>>>>>> > > >>>>>>>>>> We only supports OPTIONS hint in the FLIP, and i have changed > > >> the > > >>>>>>>>>> title to > > >>>>>>>>>> "Supports dynamic table options", would make it more clear in > > >> the > > >>>>>>>>>> WIKI. > > >>>>>>>>>> > > >>>>>>>>>> 5. Yes, we also have this concerns from our offline > discussion, > > >>>>>>>>>> that is > > >>>>>>>>>> one of the reason, why i change the > > >>>>>>>>>> TableSourceFactory#supportedHintOpitons > > >>>>>>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose > > >>>>>>>>>> Set<String> > > >>>>>>>>>> supportedHintOptionKeys() with wildcard for 2 reasons: > > >>>>>>>>>> > > >>>>>>>>>> - The wildcard is still not descriptive, we can still not > > >>>>>>>>>> forbidden one > > >>>>>>>>>> of the properties among the wildcard properties, we can not > > >> enable > > >>>> or > > >>>>>>>>>> disable them totally > > >>>>>>>>>> - ConfigOption is our new structure for keys, and it does > > >> not > > >>>>>>>>>> support > > >>>>>>>>>> wildcard yet. > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Danny Chan > > >>>>>>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz > > >>>>>>>>>> <dwysakow...@apache.org>,写道: > > >>>>>>>>>>> Hi, > > >>>>>>>>>>> Few comments from my side: > > >>>>>>>>>>> 1. Regarding the motivation: > > >>>>>>>>>>> I think the example with changing the update-mode is not a > good > > >>>>>>>>>>> one. In > > >>>>>>>>>> the long term this should be done with EMIT CHANGELOG > (discussed > > >>> in > > >>>>>>>>>> FLIP-105). > > >>>>>>>>>>> Nitpicking: I would mention that it is rather for > > >>> debugging/ad-hoc > > >>>>>>>>>> solution. I think this should not be a recommended way for > > >>>> production > > >>>>>>>>>> use > > >>>>>>>>>> cases as it bypasses the Catalog, which should be the source > of > > >>>>>>>>>> truth. > > >>>>>>>>>>> 2. I could not understand how the additional options will be > > >>>>>>>>>>> passed to > > >>>>>>>>>> the TableSourceFactory. Could you elaborate a bit more on > that? > > >> I > > >>>> see > > >>>>>>>>>> there > > >>>>>>>>>> is a Context interface that gives the options. But cannot > find a > > >>> way > > >>>>>>>>>> to get > > >>>>>>>>>> the context itself in the factory. Moreover I think it would > > >> make > > >>>>>>>>>> more > > >>>>>>>>>> sense to have rather a push based approach here. Something > like > > >>>>>>>>>> applyOptions(ReadableConfig) method. > > >>>>>>>>>>> 3. As for the concerns Jingsong raised in the voting thread. > I > > >>>>>>>>>>> think it > > >>>>>>>>>> is not a big problem, but I agree this should be also > > >> described. I > > >>>>>>>>>> disagree > > >>>>>>>>>> with "Connector don't know format information in TableFactory > > >>> before > > >>>>>>>>>> obtains real properties, so it can not list any format > > >>>>>>>>>> `supportedHintOptions`". > > >>>>>>>>>>> When a factory is instantiated it has access to the > > >> CatalogTable, > > >>>>>>>>>> therefore it has access to all the original properties. In > turn > > >> it > > >>>>>>>>>> knows > > >>>>>>>>>> the original format and can call > > >>>>>>>>>> FormatFactory#supportedHintOptions(). > > >>>>>>>>>>> The only case when this would not work would be if we allow > > >>>> changing > > >>>>>>>>>>> the > > >>>>>>>>>> format of the Table (e.g. from avro to parquet), which does > not > > >>>> sound > > >>>>>>>>>> like > > >>>>>>>>>> a good idea to me. I think this feature should not end up as a > > >> way > > >>>> to > > >>>>>>>>>> declare a whole table inline in a SQL query, but should rather > > >> be > > >>> a > > >>>>>>>>>> simple > > >>>>>>>>>> way for debugging queries. We should not end up with an > extreme > > >>>>>>>>>> example > > >>>>>>>>>> where we do: > > >>>>>>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', > ..., > > >>>>>>>>>> 'format.type' = 'json', ....) */ > > >>>>>>>>>>> 4. SQL Hints syntax. > > >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING > > >> (not > > >>>>>>>>>>> sure > > >>>>>>>>>> if it is equivalent to string_literal). I think we should not > > >> use > > >>>>>>>>>> simple_identifier because this implies that we cannot use e.g. > > >> any > > >>>>>>>>>> SQL > > >>>>>>>>>> keywords. Anyway it has nothing to do with identifiers. If I > am > > >>> not > > >>>>>>>>>> mistaken it is also how the options in the CREATE statement > are > > >>>>>>>>>> implemented. > > >>>>>>>>>>> What is the purpose of the remaining hint_item: > > >>> hint_name(hint_opt > > >>>> [ > > >>>>>>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I > got a > > >>>>>>>>>> feeling it > > >>>>>>>>>> does also suggests to support the whole Apache Calcite hint > > >> system > > >>>>>>>>>> without > > >>>>>>>>>> specifying that explicitly. Is the intention of the FLIP to > > >>> support > > >>>>>>>>>> choosing e.g. JOIN strategies through hints already? If it is > so > > >>> it > > >>>>>>>>>> should > > >>>>>>>>>> be mentioned in the FLIP, imo. > > >>>>>>>>>>> 5. I think something does not work around the > > >>>>>>>>>>> supportedHintOptions and > > >>>>>>>>>> wildcards. How do you want to represent a wildcard key as a > > >>>>>>>>>> ConfigOption? I > > >>>>>>>>>> am not sure about that, just a though, maybe it make sense to > > >> have > > >>>>>>>>>> rather > > >>>>>>>>>> Set<String> supportedHintOptionKeys()? > > >>>>>>>>>>> Best, > > >>>>>>>>>>> Dawid > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>>>>> > > >>>> > > >>>> > > >>> > > >> > > > > > > > >