Thanks everyone for the feedback ~ - For the global config option belongs to `ExecutionConfigOptions` or `OptimizerConfigOptions`, i have to strong objections, switch to `OptimizerConfigOptions` is okey to me and i have updated the WIKI - For use while-list or black-list, i have opinion with Timo, so black-list
I would fire a Vote if there are no other objections soon, thanks ~ Timo Walther <twal...@apache.org> 于2020年3月26日周四 下午6:31写道: > Hi everyone, > > it is not only about security concerns. Hint options should be > well-defined. We had a couple of people that were concerned about > changing the semantics with a concept that is called "hint". These > options are more like "debugging options" while someone is developing a > connector or using a notebook to quickly produce some rows. > > The final pipeline should use a temporary table instead. I suggest to > use a whitelist and force people to think about what should be exposed > as a hint. By default, no option should be exposed. It is better to be > conservative here. > > Regards, > Timo > > > On 26.03.20 10:31, Danny Chan wrote: > > Thanks Kurt for the suggestion ~ > > > > In my opinion: > > - There is no need for TableFormatFactory#supportedHintOptions because > all > > the format options can be configured dynamically, they have no security > > issues > > - Dynamic table options is not an optimization, it is more like an > > execution behavior from my side > > > > Kurt Young <ykt...@gmail.com> 于2020年3月26日周四 下午4:47写道: > > > >> Hi Danny, > >> > >> Thanks for the updates. I have 2 comments regarding to latest document: > >> > >> 1) I think we also need `*supportedHintOptions*` for > >> `*TableFormatFactory*` > >> 2) IMO "dynamic-table-options.enabled" should belong to ` > >> *OptimizerConfigOptions*` > >> > >> Best, > >> Kurt > >> > >> > >> On Thu, Mar 26, 2020 at 4:40 PM Timo Walther <twal...@apache.org> > wrote: > >> > >>> Thanks for the update Danny. +1 for this proposal. > >>> > >>> Regards, > >>> Timo > >>> > >>> On 26.03.20 04:51, Danny Chan wrote: > >>>> Thanks everyone who engaged in this discussion ~ > >>>> > >>>> Our goal is "Supports Dynamic Table Options for Flink SQL". After an > >>>> offline discussion with Kurt, Timo and Dawid, we have made the final > >>>> conclusion, here is the summary: > >>>> > >>>> > >>>> - Use comment style syntax to specify the dynamic table options: > >> "/*+ > >>>> *OPTIONS*(k1='v1', k2='v2') */" > >>>> - Have constraint on the options keys: the options that may bring > >> in > >>>> security problems should not be allowed, i.e. Kafka connector > >>> zookeeper > >>>> endpoint URL and topic name > >>>> - Use white-list to control the allowed options for each > connector, > >>>> which is more safe for future extention > >>>> - We allow to enable/disable this feature globally > >>>> - Implement based on the current code base first, and when > FLIP-95 > >> is > >>>> checked in, implement this feature based on new interface > >>>> > >>>> Any suggestions are appreciated ~ > >>>> > >>>> [1] > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL > >>>> > >>>> Best, > >>>> Danny Chan > >>>> > >>>> Jark Wu <imj...@gmail.com> 于2020年3月18日周三 下午10:38写道: > >>>> > >>>>> Hi everyone, > >>>>> > >>>>> Sorry, but I'm not sure about the `supportedHintOptions`. I'm afraid > >> it > >>>>> doesn't solve the problems but increases some development and > learning > >>>>> burdens. > >>>>> > >>>>> # increase development and learning burden > >>>>> > >>>>> According to the discussion so far, we want to support overriding a > >>> subset > >>>>> of options in hints which doesn't affect semantics. > >>>>> With the `supportedHintOptions`, it's up to the connector developers > >> to > >>>>> decide which options will not affect semantics, and to be hint > >> options. > >>>>> However, the question is how to distinguish whether an option will > >>> *affect > >>>>> semantics*? What happens if an option will affect semantics but > >>> provided as > >>>>> hint options? > >>>>> From my point of view, it's not easy to distinguish. For example, > the > >>>>> "format.ignore-parse-error" can be a very useful dynamic option but > >> that > >>>>> will affect semantic, because the result is different (null vs > >>> exception). > >>>>> Another example, the "connector.lookup.cache.*" options are also very > >>>>> useful to tune jobs, however, it will also affect the job results. I > >> can > >>>>> come up many more useful options but may affect semantics. > >>>>> > >>>>> I can see that the community will under endless discussion around > "can > >>> this > >>>>> option to be a hint option?", "wether this option will affect > >>> semantics?". > >>>>> You can also find that we already have different opinions on > >>>>> "ignore-parse-error". Those discussion is a waste of time! That's not > >>> what > >>>>> users want! > >>>>> The problem is user need this, this, this options and HOW to expose > >>> them? > >>>>> We should focus on that. > >>>>> > >>>>> Then there could be two endings in the future: > >>>>> 1) compromise on the usability, we drop the rule that hints don't > >> affect > >>>>> semantics, allow all the useful options in the hints list. > >>>>> 2) stick on the rule, users will find this is a stumbling feature > >> which > >>>>> doesn't solve their problems. > >>>>> And they will be surprised why this option can't be set, but > the > >>> other > >>>>> could. *semantic* is hard to be understood by users. > >>>>> > >>>>> # doesn't solve the problems > >>>>> > >>>>> I think the purpose of this FLIP is to allow users to quickly > override > >>> some > >>>>> connectors' properties to tune their jobs. > >>>>> However, `supportedHintOptions` is off track. It only allows a subset > >>>>> options and for the users it's not *clear* which subset is allowed. > >>>>> > >>>>> Besides, I'm not sure `supportedHintOptions` can work well for all > >>> cases. > >>>>> How could you support kafka properties (`connector.properties.*`) as > >>> hint > >>>>> options? Some kafka properties may affect semantics > >> (bootstrap.servers), > >>>>> some may not (max.poll.records). Besides, I think it's not possible > to > >>> list > >>>>> all the possible kafka properties [1]. > >>>>> > >>>>> In summary, IMO, `supportedHintOptions` > >>>>> (1) it increase the complexity to develop a connector > >>>>> (2) it confuses users which options can be used in hint, which are > >> not, > >>>>> they have to check the docs again and again. > >>>>> (3) it doesn't solve the problems which we want to solve by this > FLIP. > >>>>> > >>>>> I think we should avoid introducing some partial solutions. > Otherwise, > >>> we > >>>>> will be stuck in a loop that introduce new API -> deprecate API -> > >>>>> introduce new API.... > >>>>> > >>>>> I personally in favor of an explicit WITH syntax after the table as a > >>> part > >>>>> of the query which is mentioned by Kurt before, e.g. SELECT * from T > >>>>> WITH('key' = 'value') . > >>>>> It allows users to dynamically set options which can affect > semantics. > >>> It > >>>>> will be very flexible to solve users' problems so far. > >>>>> > >>>>> Best, > >>>>> Jark > >>>>> > >>>>> [1]: https://kafka.apache.org/documentation/#consumerconfigs > >>>>> > >>>>> On Wed, 18 Mar 2020 at 21:44, Danny Chan <yuzhao....@gmail.com> > >> wrote: > >>>>> > >>>>>> My POC is here for the hints options merge [1]. > >>>>>> > >>>>>> Personally, I have no strong objections for splitting hints with the > >>>>>> CatalogTable, the only cons is a more complex implementation but the > >>>>>> concept is more clear, and I have updated the WIKI. > >>>>>> > >>>>>> I think it would be nice if we can support the format “ignore-parse > >>>>> error” > >>>>>> option key, the CSV source already has a key [2] and we can use that > >> in > >>>>> the > >>>>>> supportedHIntOptions, for the common CSV and JSON formats, we cal > >> also > >>>>> give > >>>>>> a support. This is the only kind of key in formats that “do not > >> change > >>>>> the > >>>>>> semantics” (somehow), what do you think about this ~ > >>>>>> > >>>>>> [1] > >>>>>> > >>>>> > >>> > >> > https://github.com/danny0405/flink/commit/5d925fa16c3c553423c4b7d93001521b8e6e6bee#diff-6e569a6dd124fd2091c18e2790fb49c5 > >>>>>> [2] > >>>>>> > >>>>> > >>> > >> > https://github.com/apache/flink/blob/b83060dff6d403b6994b6646b3f29a374f599530/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/sources/CsvTableSourceFactoryBase.java#L92 > >>>>>> > >>>>>> Best, > >>>>>> Danny Chan > >>>>>> 在 2020年3月18日 +0800 PM9:10,Timo Walther <twal...@apache.org>,写道: > >>>>>>> Hi everyone, > >>>>>>> > >>>>>>> +1 to Kurt's suggestion. Let's just have it in source and sink > >>>>> factories > >>>>>>> for now. We can still move this method up in the future. Currently, > >> I > >>>>>>> don't see a need for catalogs or formats. Because how would you > >> target > >>>>> a > >>>>>>> format in the query? > >>>>>>> > >>>>>>> @Danny: Can you send a link to your PoC? I'm very skeptical about > >>>>>>> creating a new CatalogTable in planner. Actually CatalogTable > should > >>> be > >>>>>>> immutable between Catalog and Factory. Because a catalog can return > >>> its > >>>>>>> own factory and fully control the instantiation. Depending on the > >>>>>>> implementation, that means it can be possible that the catalog has > >>>>>>> encoded more information in a concrete subclass implementing the > >>>>>>> interface. I vote for separating the concerns of catalog > information > >>>>> and > >>>>>>> hints in the factory explicitly. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>>> > >>>>>>> On 18.03.20 05:41, Jingsong Li wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I am thinking we can provide hints to *table* related instances. > >>>>>>>> - TableFormatFactory: of cause we need hints support, there are > >> many > >>>>>> format > >>>>>>>> options in DDL too. > >>>>>>>> - catalog and module: I don't know, maybe in future we can provide > >>>>> some > >>>>>>>> hints for them. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong Lee > >>>>>>>> > >>>>>>>> On Wed, Mar 18, 2020 at 12:28 PM Danny Chan <yuzhao....@gmail.com > > > >>>>>> wrote: > >>>>>>>> > >>>>>>>>> Yes, I think we should move the `supportedHintOptions` from > >>>>>> TableFactory > >>>>>>>>> to TableSourceFactory, and we also need to add the interface to > >>>>>>>>> TableSinkFactory though because sink target table may also have > >>>>> hints > >>>>>>>>> attached. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Danny Chan > >>>>>>>>> 在 2020年3月18日 +0800 AM11:08,Kurt Young <ykt...@gmail.com>,写道: > >>>>>>>>>> Have one question for adding `supportedHintOptions` method to > >>>>>>>>>> `TableFactory`. It seems > >>>>>>>>>> `TableFactory` is a base factory interface for all *table > module* > >>>>>> related > >>>>>>>>>> instances, such as > >>>>>>>>>> catalog, module, format and so on. It's not created only for > >>>>>> *table*. Is > >>>>>>>>> it > >>>>>>>>>> possible to move it > >>>>>>>>>> to `TableSourceFactory`? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Kurt > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Mar 18, 2020 at 10:59 AM Danny Chan < > >>>>> yuzhao....@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Thanks Timo ~ > >>>>>>>>>>> > >>>>>>>>>>> For the naming itself, I also think the PROPERTIES is not that > >>>>>>>>> concise, so > >>>>>>>>>>> +1 for OPTIONS (I had thought about that, but there are many > >>>>>> codes in > >>>>>>>>>>> current Flink called it properties, i.e. the > >>>>>> DescriptorProperties, > >>>>>>>>>>> #getSupportedProperties), let’s use OPTIONS if this is our new > >>>>>>>>> preference. > >>>>>>>>>>> > >>>>>>>>>>> +1 to `Set<ConfigOption> supportedHintOptions()` because the > >>>>>>>>> ConfigOption > >>>>>>>>>>> can take more info. AFAIK, Spark also call their table options > >>>>>> instead > >>>>>>>>> of > >>>>>>>>>>> properties. [1] > >>>>>>>>>>> > >>>>>>>>>>> In my local POC, I did create a new CatalogTable, and it works > >>>>>> for > >>>>>>>>> current > >>>>>>>>>>> connectors well, all the DDL tables would finally yield a > >>>>>> CatalogTable > >>>>>>>>>>> instance and we can apply the options to that(in the > >>>>>> CatalogSourceTable > >>>>>>>>>>> when we generating the TableSource), the pros is that we do not > >>>>>> need to > >>>>>>>>>>> modify the codes of connectors itself. If we split the options > >>>>>> from > >>>>>>>>>>> CatalogTable, we may need to add some additional logic in each > >>>>>>>>> connector > >>>>>>>>>>> factories in order to merge these properties (and the logic are > >>>>>> almost > >>>>>>>>> the > >>>>>>>>>>> same), what do you think about this? > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>>> > >>> > >> > https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Danny Chan > >>>>>>>>>>> 在 2020年3月17日 +0800 PM10:10,Timo Walther <twal...@apache.org > >>>>>> ,写道: > >>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>> > >>>>>>>>>>>> thanks for updating the FLIP. I think your current design is > >>>>>>>>> sufficient > >>>>>>>>>>>> to separate hints from result-related properties. > >>>>>>>>>>>> > >>>>>>>>>>>> One remark to the naming itself: I would vote for calling the > >>>>>> hints > >>>>>>>>>>>> around table scan `OPTIONS('k'='v')`. We used the term > >>>>>> "properties" > >>>>>>>>> in > >>>>>>>>>>>> the past but since we want to unify the Flink configuration > >>>>>>>>> experience, > >>>>>>>>>>>> we should use consistent naming and classes around > >>>>>> `ConfigOptions`. > >>>>>>>>>>>> > >>>>>>>>>>>> It would be nice to use `Set<ConfigOption> > >>>>>> supportedHintOptions();` > >>>>>>>>> to > >>>>>>>>>>>> start using config options instead of pure string properties. > >>>>>> This > >>>>>>>>> will > >>>>>>>>>>>> also allow us to generate documentation in the future around > >>>>>>>>> supported > >>>>>>>>>>>> data types, ranges, etc. for options. At some point we would > >>>>>> also > >>>>>>>>> like > >>>>>>>>>>>> to drop `DescriptorProperties` class. "Options" is also used > >>>>>> in the > >>>>>>>>>>>> documentation [1] and in the SQL/MED standard [2]. > >>>>>>>>>>>> > >>>>>>>>>>>> Furthermore, I would still vote for separating CatalogTable > >>>>>> and hint > >>>>>>>>>>>> options. Otherwise the planner would need to create a new > >>>>>>>>> CatalogTable > >>>>>>>>>>>> instance which might not always be easy. We should offer them > >>>>>> via: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>> > org.apache.flink.table.factories.TableSourceFactory.Context#getHints: > >>>>>>>>>>>> ReadableConfig > >>>>>>>>>>>> > >>>>>>>>>>>> What do you think? > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> Timo > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>>> > >>> > >> > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table > >>>>>>>>>>>> [2] https://wiki.postgresql.org/wiki/SQL/MED > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 12.03.20 15:06, Stephan Ewen wrote: > >>>>>>>>>>>>> @Danny sounds good. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Maybe it is worth listing all the classes of problems that > >>>>>> you > >>>>>>>>> want to > >>>>>>>>>>>>> address and then look at each class and see if hints are a > >>>>>> good > >>>>>>>>> default > >>>>>>>>>>>>> solution or a good optional way of simplifying things? > >>>>>>>>>>>>> The discussion has grown a lot and it is starting to be > >>>>> hard > >>>>>> to > >>>>>>>>>>> distinguish > >>>>>>>>>>>>> the parts where everyone agrees from the parts were there > >>>>> are > >>>>>>>>> concerns. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Mar 12, 2020 at 2:31 PM Danny Chan < > >>>>>> danny0...@apache.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks Stephan ~ > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> We can remove the support for properties that may change > >>>>>> the > >>>>>>>>>>> semantics of > >>>>>>>>>>>>>> query if you think that is a trouble. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> How about we support the /*+ properties() */ hint only > >>>>> for > >>>>>> those > >>>>>>>>>>> optimize > >>>>>>>>>>>>>> parameters, such as the fetch size of source or something > >>>>>> like > >>>>>>>>> that, > >>>>>>>>>>> does > >>>>>>>>>>>>>> that make sense? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Stephan Ewen <se...@apache.org>于2020年3月12日 周四下午7:45写道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think Bowen has actually put it very well. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> (1) Hints that change semantics looks like trouble > >>>>>> waiting to > >>>>>>>>>>> happen. For > >>>>>>>>>>>>>>> example Kafka offset handling should be in filters. The > >>>>>> Kafka > >>>>>>>>>>> source > >>>>>>>>>>>>>> should > >>>>>>>>>>>>>>> support predicate pushdown. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> (2) Hints should not be a workaround for current > >>>>>> shortcomings. > >>>>>>>>> A > >>>>>>>>>>> lot of > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> suggested above sounds exactly like that. Working > >>>>> around > >>>>>>>>>>> catalog/DDL > >>>>>>>>>>>>>>> shortcomings, missing exposure of metadata (offsets), > >>>>>> missing > >>>>>>>>>>> predicate > >>>>>>>>>>>>>>> pushdown in Kafka. Abusing a feature like hints now as > >>>>> a > >>>>>> quick > >>>>>>>>> fix > >>>>>>>>>>> for > >>>>>>>>>>>>>>> these issues, rather than fixing the root causes, will > >>>>>> much > >>>>>>>>> likely > >>>>>>>>>>> bite > >>>>>>>>>>>>>> us > >>>>>>>>>>>>>>> back badly in the future. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Stephan > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Thu, Mar 12, 2020 at 10:43 AM Kurt Young < > >>>>>> ykt...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> It seems this FLIP's name is somewhat misleading. > >>>>> From > >>>>>> my > >>>>>>>>>>>>>> understanding, > >>>>>>>>>>>>>>>> this FLIP is trying to > >>>>>>>>>>>>>>>> address the dynamic parameter issue, and table hints > >>>>>> is the > >>>>>>>>> way > >>>>>>>>>>> we wan > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>> choose. I think we should > >>>>>>>>>>>>>>>> be focus on "what's the right way to solve dynamic > >>>>>> property" > >>>>>>>>>>> instead of > >>>>>>>>>>>>>>>> discussing "whether table > >>>>>>>>>>>>>>>> hints can affect query semantics". > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For now, there are two proposed ways to achieve > >>>>> dynamic > >>>>>>>>> property: > >>>>>>>>>>>>>>>> 1. FLIP-110: create temporary table xx like xx with > >>>>>> (xxx) > >>>>>>>>>>>>>>>> 2. use custom "from t with (xxx)" syntax > >>>>>>>>>>>>>>>> 3. "Borrow" the table hints to have a special > >>>>>> PROPERTIES > >>>>>>>>> hint. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The first one didn't break anything, but the only > >>>>>> problem i > >>>>>>>>> see > >>>>>>>>>>> is a > >>>>>>>>>>>>>>> little > >>>>>>>>>>>>>>>> more verbose than the table hint > >>>>>>>>>>>>>>>> approach. I can imagine when someone using SQL CLI to > >>>>>> have a > >>>>>>>>> sql > >>>>>>>>>>>>>>>> experience, it's quite often that > >>>>>>>>>>>>>>>> he will modify the table property, some use cases i > >>>>> can > >>>>>>>>> think of: > >>>>>>>>>>>>>>>> 1. the source contains some corrupted data, i want to > >>>>>> turn > >>>>>>>>> on the > >>>>>>>>>>>>>>>> "ignore-error" flag for certain formats. > >>>>>>>>>>>>>>>> 2. I have a kafka table and want to see some sample > >>>>>> data > >>>>>>>>> from the > >>>>>>>>>>>>>>>> beginning, so i change the offset > >>>>>>>>>>>>>>>> to "earliest", and then I want to observe the latest > >>>>>> data > >>>>>>>>> which > >>>>>>>>>>> keeps > >>>>>>>>>>>>>>>> coming in. I would write another query > >>>>>>>>>>>>>>>> to select from the latest table. > >>>>>>>>>>>>>>>> 3. I want to my jdbc sink flush data more eagerly > >>>>> then > >>>>>> i can > >>>>>>>>>>> observe > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> data from database side. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Most of such use cases are quite ad-hoc. If every > >>>>> time > >>>>>> I > >>>>>>>>> want to > >>>>>>>>>>> have a > >>>>>>>>>>>>>>>> different experience, i need to create > >>>>>>>>>>>>>>>> a temporary table and then also modify my query, it > >>>>>> doesn't > >>>>>>>>> feel > >>>>>>>>>>>>>> smooth. > >>>>>>>>>>>>>>>> Embed such dynamic property into > >>>>>>>>>>>>>>>> query would have better user experience. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Both 2 & 3 can make this happen. The cons of #2 is > >>>>>> breaking > >>>>>>>>> SQL > >>>>>>>>>>>>>>> compliant, > >>>>>>>>>>>>>>>> and for #3, it only breaks some > >>>>>>>>>>>>>>>> unwritten rules, but we can have an explanation on > >>>>>> that. And > >>>>>>>>> I > >>>>>>>>>>> really > >>>>>>>>>>>>>>> doubt > >>>>>>>>>>>>>>>> whether user would complain about > >>>>>>>>>>>>>>>> this when they actually have flexible and good > >>>>>> experience > >>>>>>>>> using > >>>>>>>>>>> this. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> My tendency would be #3 > #1 > #2, what do you think? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Mar 12, 2020 at 1:11 PM Danny Chan < > >>>>>>>>> yuzhao....@gmail.com > >>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks Aljoscha ~ > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I agree for most of the query hints, they are > >>>>>> optional as > >>>>>>>>> an > >>>>>>>>>>>>>> optimizer > >>>>>>>>>>>>>>>>> instruction, especially for the traditional RDBMS. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> But, just like BenChao said, Flink as a computation > >>>>>> engine > >>>>>>>>> has > >>>>>>>>>>> many > >>>>>>>>>>>>>>>>> different kind of data sources, thus, dynamic > >>>>>> parameters > >>>>>>>>> like > >>>>>>>>>>>>>>>> start_offest > >>>>>>>>>>>>>>>>> can only bind to each table scope, we can not set a > >>>>>> session > >>>>>>>>>>> config > >>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>> KSQL because they are all about Kafka: > >>>>>>>>>>>>>>>>>> SET ‘auto.offset.reset’=‘earliest’; > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thus the most flexible way to set up these dynamic > >>>>>> params > >>>>>>>>> is > >>>>>>>>>>> to bind > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> the table scope in the query when we want to > >>>>> override > >>>>>>>>>>> something, so > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>> these solutions above (with pros and cons from my > >>>>>> side): > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> • 1. Select * from t(offset=123) (from Timo) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>>>>> - Parameters are part of the main query > >>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>> - Not SQL compliant > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> • 2. Select * from t /*+ PROPERTIES(offset=123) */ > >>>>>> (from > >>>>>>>>> me) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>>>>> - SQL compliant because it is nested in the > >>>>> comments > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>> - Parameters are not part of the main query > >>>>>>>>>>>>>>>>> - Cryptic syntax for new users > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> The biggest problem for hints way may be the “if > >>>>>> hints > >>>>>>>>> must be > >>>>>>>>>>>>>>> optional”, > >>>>>>>>>>>>>>>>> actually we have though about 1 for a while but > >>>>>> aborted > >>>>>>>>>>> because it > >>>>>>>>>>>>>>> breaks > >>>>>>>>>>>>>>>>> the SQL standard too much. And we replace it with > >>>>> 2, > >>>>>>>>> because > >>>>>>>>>>> the > >>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>> syntax do not break SQL standard(nested in > >>>>> comments). > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> What if we have the special /*+ PROPERTIES */ hint > >>>>>> that > >>>>>>>>> allows > >>>>>>>>>>>>>> override > >>>>>>>>>>>>>>>>> some properties of table dynamically, it does not > >>>>>> break > >>>>>>>>>>> anything, at > >>>>>>>>>>>>>>>> lease > >>>>>>>>>>>>>>>>> for current Flink use cases. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Planner hints are optional just because they are > >>>>>> naturally > >>>>>>>>>>> enforcers > >>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>> the planner, most of them aim to instruct the > >>>>>> optimizer, > >>>>>>>>> but, > >>>>>>>>>>> the > >>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>>> hints is a little different, table hints can > >>>>> specify > >>>>>> the > >>>>>>>>> table > >>>>>>>>>>> meta > >>>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>> index column, and it is very convenient to specify > >>>>>> table > >>>>>>>>>>> properties. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Or shall we not call /*+ PROPERTIES(offset=123) */ > >>>>>> table > >>>>>>>>> hint, > >>>>>>>>>>> we > >>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>> call it table dynamic parameters. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM9:20,Aljoscha Krettek < > >>>>>>>>>>> aljos...@apache.org>,写道: > >>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I don't understand this discussion. Hints, as I > >>>>>>>>> understand > >>>>>>>>>>> them, > >>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>> work like this: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> - hints are *optional* advice for the optimizer > >>>>> to > >>>>>> try > >>>>>>>>> and > >>>>>>>>>>> help it > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> find a good execution strategy > >>>>>>>>>>>>>>>>>> - hints should not change query semantics, i.e. > >>>>>> they > >>>>>>>>> should > >>>>>>>>>>> not > >>>>>>>>>>>>>>> change > >>>>>>>>>>>>>>>>>> connector properties executing a query with > >>>>> taking > >>>>>> into > >>>>>>>>>>> account the > >>>>>>>>>>>>>>>>>> hints *must* produce the same result as executing > >>>>>> the > >>>>>>>>> query > >>>>>>>>>>> without > >>>>>>>>>>>>>>>>>> taking into account the hints > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> From these simple requirements you can derive a > >>>>>> solution > >>>>>>>>>>> that makes > >>>>>>>>>>>>>>>>>> sense. I don't have a strong preference for the > >>>>>> syntax > >>>>>>>>> but we > >>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>> strive to be in line with prior work. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> Aljoscha > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 11.03.20 11:53, Danny Chan wrote: > >>>>>>>>>>>>>>>>>>> Thanks Timo for summarize the 3 options ~ > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I agree with Kurt that option2 is too > >>>>>> complicated to > >>>>>>>>> use > >>>>>>>>>>> because: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> • As a Kafka topic consumer, the user must > >>>>>> define both > >>>>>>>>> the > >>>>>>>>>>>>>> virtual > >>>>>>>>>>>>>>>>> column for start offset and he must apply a special > >>>>>> filter > >>>>>>>>>>> predicate > >>>>>>>>>>>>>>>> after > >>>>>>>>>>>>>>>>> each query > >>>>>>>>>>>>>>>>>>> • And for the internal implementation, the > >>>>>> metadata > >>>>>>>>> column > >>>>>>>>>>> push > >>>>>>>>>>>>>>> down > >>>>>>>>>>>>>>>>> is another hard topic, each kind of message queue > >>>>>> may have > >>>>>>>>> its > >>>>>>>>>>> offset > >>>>>>>>>>>>>>>>> attribute, we need to consider the expression type > >>>>>> for > >>>>>>>>>>> different > >>>>>>>>>>>>>> kind; > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> source also need to recognize the constant column > >>>>> as > >>>>>> a > >>>>>>>>> config > >>>>>>>>>>>>>>>> option(which > >>>>>>>>>>>>>>>>> is weird because usually what we pushed down is a > >>>>>> table > >>>>>>>>> column) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> For option 1 and option3, I think there is no > >>>>>>>>> difference, > >>>>>>>>>>> option1 > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>> also a hint syntax which is introduced in Sybase > >>>>> and > >>>>>>>>>>> referenced then > >>>>>>>>>>>>>>>>> deprecated by MS-SQL in 199X years because of the > >>>>>>>>>>> ambitiousness. > >>>>>>>>>>>>>>>> Personally > >>>>>>>>>>>>>>>>> I prefer /*+ */ style table hint than WITH keyword > >>>>>> for > >>>>>>>>> these > >>>>>>>>>>> reasons: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> • We do not break the standard SQL, the hints > >>>>> are > >>>>>>>>> nested > >>>>>>>>>>> in SQL > >>>>>>>>>>>>>>>>> comments > >>>>>>>>>>>>>>>>>>> • We do not need to introduce additional WITH > >>>>>> keyword > >>>>>>>>>>> which may > >>>>>>>>>>>>>>>> appear > >>>>>>>>>>>>>>>>> in a query if we use that because a table can be > >>>>>>>>> referenced in > >>>>>>>>>>> all > >>>>>>>>>>>>>>> kinds > >>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>> SQL contexts: INSERT/DELETE/FROM/JOIN …. That would > >>>>>> make > >>>>>>>>> our > >>>>>>>>>>> sql > >>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>> break too much of the SQL from standard > >>>>>>>>>>>>>>>>>>> • We would have uniform syntax for hints as > >>>>> query > >>>>>>>>> hint, one > >>>>>>>>>>>>>> syntax > >>>>>>>>>>>>>>>>> fits all and more easy to use > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> And here is the reason why we choose a uniform > >>>>>> Oracle > >>>>>>>>>>> style query > >>>>>>>>>>>>>>>>> hint syntax which is addressed by Julian Hyde when > >>>>> we > >>>>>>>>> design > >>>>>>>>>>> the > >>>>>>>>>>>>>> syntax > >>>>>>>>>>>>>>>>> from the Calcite community: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I don’t much like the MSSQL-style syntax for > >>>>>> table > >>>>>>>>> hints. > >>>>>>>>>>> It > >>>>>>>>>>>>>> adds a > >>>>>>>>>>>>>>>>> new use of the WITH keyword that is unrelated to > >>>>> the > >>>>>> use of > >>>>>>>>>>> WITH for > >>>>>>>>>>>>>>>>> common-table expressions. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> A historical note. Microsoft SQL Server > >>>>>> inherited its > >>>>>>>>> hint > >>>>>>>>>>> syntax > >>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>> Sybase a very long time ago. (See “Transact SQL > >>>>>>>>>>> Programming”[1], page > >>>>>>>>>>>>>>>> 632, > >>>>>>>>>>>>>>>>> “Optimizer hints”. The book was written in 1999, > >>>>> and > >>>>>> covers > >>>>>>>>>>> Microsoft > >>>>>>>>>>>>>>> SQL > >>>>>>>>>>>>>>>>> Server 6.5 / 7.0 and Sybase Adaptive Server 11.5, > >>>>>> but the > >>>>>>>>>>> syntax very > >>>>>>>>>>>>>>>>> likely predates Sybase 4.3, from which Microsoft > >>>>> SQL > >>>>>>>>> Server was > >>>>>>>>>>>>>> forked > >>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> 1993.) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Microsoft later added the WITH keyword to make > >>>>>> it less > >>>>>>>>>>> ambiguous, > >>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> has now deprecated the syntax that does not use > >>>>> WITH. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> They are forced to keep the syntax for > >>>>> backwards > >>>>>>>>>>> compatibility > >>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>>> that doesn’t mean that we should shoulder their > >>>>>> burden. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I think formatted comments are the right > >>>>>> container for > >>>>>>>>>>> hints > >>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>> it allows us to change the hint syntax without > >>>>>> changing > >>>>>>>>> the SQL > >>>>>>>>>>>>>> parser, > >>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> makes clear that we are at liberty to ignore hints > >>>>>>>>> entirely. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Julian > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> [1] https://www.amazon.com/s?k=9781565924017 < > >>>>>>>>>>>>>>>>> https://www.amazon.com/s?k=9781565924017> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM4:03,Timo Walther < > >>>>>>>>> twal...@apache.org > >>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> it is true that our DDL is not standard > >>>>>> compliant by > >>>>>>>>>>> using the > >>>>>>>>>>>>>>> WITH > >>>>>>>>>>>>>>>>>>>> clause. Nevertheless, we aim for not > >>>>> diverging > >>>>>> too > >>>>>>>>> much > >>>>>>>>>>> and the > >>>>>>>>>>>>>>>> LIKE > >>>>>>>>>>>>>>>>>>>> clause is an example of that. It will solve > >>>>>> things > >>>>>>>>> like > >>>>>>>>>>>>>>> overwriting > >>>>>>>>>>>>>>>>>>>> WATERMARKs, add additional/modifying > >>>>>> properties and > >>>>>>>>>>> inherit > >>>>>>>>>>>>>>> schema. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Bowen is right that Flink's DDL is mixing 3 > >>>>>> types > >>>>>>>>>>> definition > >>>>>>>>>>>>>>>>> together. > >>>>>>>>>>>>>>>>>>>> We are not the first ones that try to solve > >>>>>> this. > >>>>>>>>> There > >>>>>>>>>>> is also > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> SQL > >>>>>>>>>>>>>>>>>>>> MED standard [1] that tried to tackle this > >>>>>> problem. I > >>>>>>>>>>> think it > >>>>>>>>>>>>>>> was > >>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>> considered when designing the current DDL. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Currently, I see 3 options for handling Kafka > >>>>>>>>> offsets. I > >>>>>>>>>>> will > >>>>>>>>>>>>>>> give > >>>>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>>>>> examples and look forward to feedback here: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> *Option 1* Runtime and semantic parms as part > >>>>>> of the > >>>>>>>>>>> query > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> `SELECT * FROM MyTable('offset'=123)` > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>>>>>>>> - Parameters are part of the main query > >>>>>>>>>>>>>>>>>>>> - No complicated hinting syntax > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>>>>> - Not SQL compliant > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> *Option 2* Use metadata in query > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> `CREATE TABLE MyTable (id INT, offset AS > >>>>>>>>>>>>>>>> SYSTEM_METADATA('offset'))` > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> `SELECT * FROM MyTable WHERE offset > > >>>>> TIMESTAMP > >>>>>>>>>>> '2012-12-12 > >>>>>>>>>>>>>>>>> 12:34:22'` > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>>>>> - SQL compliant in the query > >>>>>>>>>>>>>>>>>>>> - Access of metadata in the DDL which is > >>>>>> required > >>>>>>>>> anyway > >>>>>>>>>>>>>>>>>>>> - Regular pushdown rules apply > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>>>>> - Users need to add an additional comlumn in > >>>>>> the DDL > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> *Option 3*: Use hints for properties > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> ` > >>>>>>>>>>>>>>>>>>>> SELECT * > >>>>>>>>>>>>>>>>>>>> FROM MyTable /*+ PROPERTIES('offset'=123) */ > >>>>>>>>>>>>>>>>>>>> ` > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>>>>> - Parameters are not part of the main query > >>>>>>>>>>>>>>>>>>>> - Cryptic syntax for new users > >>>>>>>>>>>>>>>>>>>> - Not standard compliant. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> If we go with this option, I would suggest to > >>>>>> make it > >>>>>>>>>>> available > >>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>> separate map and don't mix it with statically > >>>>>> defined > >>>>>>>>>>>>>> properties. > >>>>>>>>>>>>>>>>> Such > >>>>>>>>>>>>>>>>>>>> that the factory can decide which properties > >>>>>> have the > >>>>>>>>>>> right to > >>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>> overwritten by the hints: > >>>>>>>>>>>>>>>>>>>> TableSourceFactory.Context.getQueryHints(): > >>>>>>>>>>> ReadableConfig > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> [1] https://en.wikipedia.org/wiki/SQL/MED > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Currently I see 3 options as a > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On 11.03.20 07:21, Danny Chan wrote: > >>>>>>>>>>>>>>>>>>>>> Thanks Bowen ~ > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I agree we should somehow categorize our > >>>>>> connector > >>>>>>>>>>>>>> parameters. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> For type1, I’m already preparing a solution > >>>>>> like > >>>>>>>>> the > >>>>>>>>>>>>>> Confluent > >>>>>>>>>>>>>>>>> schema registry + Avro schema inference thing, so > >>>>>> this may > >>>>>>>>> not > >>>>>>>>>>> be a > >>>>>>>>>>>>>>>> problem > >>>>>>>>>>>>>>>>> in the near future. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> For type3, I have some questions: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> "SELECT * FROM mykafka WHERE offset > > >>>>> 12pm > >>>>>>>>> yesterday” > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Where does the offset column come from, a > >>>>>> virtual > >>>>>>>>>>> column from > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> table schema, you said that > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> They change > >>>>>>>>>>>>>>>>>>>>> almost every time a query starts and have > >>>>>> nothing > >>>>>>>>> to > >>>>>>>>>>> do with > >>>>>>>>>>>>>>>>> metadata, thus > >>>>>>>>>>>>>>>>>>>>> should not be part of table definition/DDL > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> But why you can reference it in the query, > >>>>>> I’m > >>>>>>>>>>> confused for > >>>>>>>>>>>>>>> that, > >>>>>>>>>>>>>>>>> can you elaborate a little ? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM12:52,Bowen Li < > >>>>>>>>>>> bowenl...@gmail.com > >>>>>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>>>> Thanks Danny for kicking off the effort > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> The root cause of too much manual work is > >>>>>> Flink > >>>>>>>>> DDL > >>>>>>>>>>> has > >>>>>>>>>>>>>>> mixed 3 > >>>>>>>>>>>>>>>>> types of > >>>>>>>>>>>>>>>>>>>>>> params together and doesn't handle each > >>>>> of > >>>>>> them > >>>>>>>>> very > >>>>>>>>>>> well. > >>>>>>>>>>>>>>>> Below > >>>>>>>>>>>>>>>>> are how I > >>>>>>>>>>>>>>>>>>>>>> categorize them and corresponding > >>>>>> solutions in my > >>>>>>>>>>> mind: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> - type 1: Metadata of external data, like > >>>>>>>>> external > >>>>>>>>>>>>>>>> endpoint/url, > >>>>>>>>>>>>>>>>>>>>>> username/pwd, schemas, formats. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Such metadata are mostly already > >>>>>> accessible in > >>>>>>>>>>> external > >>>>>>>>>>>>>>> system > >>>>>>>>>>>>>>>>> as long as > >>>>>>>>>>>>>>>>>>>>>> endpoints and credentials are provided. > >>>>>> Flink can > >>>>>>>>>>> get it > >>>>>>>>>>>>>> thru > >>>>>>>>>>>>>>>>> catalogs, but > >>>>>>>>>>>>>>>>>>>>>> we haven't had many catalogs yet and thus > >>>>>> Flink > >>>>>>>>> just > >>>>>>>>>>> hasn't > >>>>>>>>>>>>>>>> been > >>>>>>>>>>>>>>>>> able to > >>>>>>>>>>>>>>>>>>>>>> leverage that. So the solution should be > >>>>>> building > >>>>>>>>>>> more > >>>>>>>>>>>>>>>> catalogs. > >>>>>>>>>>>>>>>>> Such > >>>>>>>>>>>>>>>>>>>>>> params should be part of a Flink table > >>>>>>>>>>> DDL/definition, and > >>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>> overridable > >>>>>>>>>>>>>>>>>>>>>> in any means. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> - type 2: Runtime params, like jdbc > >>>>>> connector's > >>>>>>>>>>> fetch size, > >>>>>>>>>>>>>>>>> elasticsearch > >>>>>>>>>>>>>>>>>>>>>> connector's bulk flush size. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Such params don't affect query results, > >>>>> but > >>>>>>>>> affect > >>>>>>>>>>> how > >>>>>>>>>>>>>>> results > >>>>>>>>>>>>>>>>> are produced > >>>>>>>>>>>>>>>>>>>>>> (eg. fast or slow, aka performance) - > >>>>> they > >>>>>> are > >>>>>>>>>>> essentially > >>>>>>>>>>>>>>>>> execution and > >>>>>>>>>>>>>>>>>>>>>> implementation details. They change often > >>>>>> in > >>>>>>>>>>> exploration or > >>>>>>>>>>>>>>>>> development > >>>>>>>>>>>>>>>>>>>>>> stages, but not quite frequently in > >>>>>> well-defined > >>>>>>>>>>>>>> long-running > >>>>>>>>>>>>>>>>> pipelines. > >>>>>>>>>>>>>>>>>>>>>> They should always have default values > >>>>> and > >>>>>> can be > >>>>>>>>>>> missing > >>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> query. They > >>>>>>>>>>>>>>>>>>>>>> can be part of a table DDL/definition, > >>>>> but > >>>>>> should > >>>>>>>>>>> also be > >>>>>>>>>>>>>>>>> replaceable in a > >>>>>>>>>>>>>>>>>>>>>> query - *this is what table "hints" in > >>>>>> FLIP-113 > >>>>>>>>>>> should > >>>>>>>>>>>>>>> cover*. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> - type 3: Semantic params, like kafka > >>>>>> connector's > >>>>>>>>>>> start > >>>>>>>>>>>>>>> offset. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Such params affect query results - the > >>>>>> semantics. > >>>>>>>>>>> They'd > >>>>>>>>>>>>>>> better > >>>>>>>>>>>>>>>>> be as > >>>>>>>>>>>>>>>>>>>>>> filter conditions in WHERE clause that > >>>>> can > >>>>>> be > >>>>>>>>> pushed > >>>>>>>>>>> down. > >>>>>>>>>>>>>>> They > >>>>>>>>>>>>>>>>> change > >>>>>>>>>>>>>>>>>>>>>> almost every time a query starts and have > >>>>>>>>> nothing to > >>>>>>>>>>> do > >>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>> metadata, thus > >>>>>>>>>>>>>>>>>>>>>> should not be part of table > >>>>>> definition/DDL, nor > >>>>>>>>> be > >>>>>>>>>>>>>> persisted > >>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> catalogs. > >>>>>>>>>>>>>>>>>>>>>> If they will, users should create views > >>>>> to > >>>>>> keep > >>>>>>>>> such > >>>>>>>>>>> params > >>>>>>>>>>>>>>>>> around (note > >>>>>>>>>>>>>>>>>>>>>> this is different from variable > >>>>>> substitution). > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Take Flink-Kafka as an example. Once we > >>>>>> get these > >>>>>>>>>>> params > >>>>>>>>>>>>>>> right, > >>>>>>>>>>>>>>>>> here're the > >>>>>>>>>>>>>>>>>>>>>> steps users need to do to develop and run > >>>>>> a Flink > >>>>>>>>>>> job: > >>>>>>>>>>>>>>>>>>>>>> - configure a Flink > >>>>>> ConfluentSchemaRegistry with > >>>>>>>>> url, > >>>>>>>>>>>>>>> username, > >>>>>>>>>>>>>>>>> and password > >>>>>>>>>>>>>>>>>>>>>> - run "SELECT * FROM mykafka WHERE offset > >>>>>>> 12pm > >>>>>>>>>>> yesterday" > >>>>>>>>>>>>>>>>> (simplified > >>>>>>>>>>>>>>>>>>>>>> timestamp) in SQL CLI, Flink > >>>>> automatically > >>>>>>>>> retrieves > >>>>>>>>>>> all > >>>>>>>>>>>>>>>>> metadata of > >>>>>>>>>>>>>>>>>>>>>> schema, file format, etc and start the > >>>>> job > >>>>>>>>>>>>>>>>>>>>>> - users want to make the job read Kafka > >>>>>> topic > >>>>>>>>>>> faster, so it > >>>>>>>>>>>>>>>> goes > >>>>>>>>>>>>>>>>> as "SELECT > >>>>>>>>>>>>>>>>>>>>>> * FROM mykafka /* faster_read_key=value*/ > >>>>>> WHERE > >>>>>>>>>>> offset > > >>>>>>>>>>>>>> 12pm > >>>>>>>>>>>>>>>>> yesterday" > >>>>>>>>>>>>>>>>>>>>>> - done and satisfied, users submit it to > >>>>>>>>> production > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Regarding "CREATE TABLE t LIKE with > >>>>> (k1=v1, > >>>>>>>>> k2=v2), > >>>>>>>>>>> I think > >>>>>>>>>>>>>>>> it's > >>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>> nice-to-have feature, but not a > >>>>>> strategically > >>>>>>>>>>> critical, > >>>>>>>>>>>>>>>>> long-term solution, > >>>>>>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>>>>> 1) It may seem promising at the current > >>>>>> stage to > >>>>>>>>>>> solve the > >>>>>>>>>>>>>>>>>>>>>> too-much-manual-work problem, but that's > >>>>>> only > >>>>>>>>>>> because Flink > >>>>>>>>>>>>>>>>> hasn't > >>>>>>>>>>>>>>>>>>>>>> leveraged catalogs well and handled the 3 > >>>>>> types > >>>>>>>>> of > >>>>>>>>>>> params > >>>>>>>>>>>>>>> above > >>>>>>>>>>>>>>>>> properly. > >>>>>>>>>>>>>>>>>>>>>> Once we get the params types right, the > >>>>>> LIKE > >>>>>>>>> syntax > >>>>>>>>>>> won't > >>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>> important, and will be just an easier way > >>>>>> to > >>>>>>>>> create > >>>>>>>>>>> tables > >>>>>>>>>>>>>>>>> without retyping > >>>>>>>>>>>>>>>>>>>>>> long fields like username and pwd. > >>>>>>>>>>>>>>>>>>>>>> 2) Note that only some rare type of > >>>>>> catalog can > >>>>>>>>>>> store k-v > >>>>>>>>>>>>>>>>> property pair, so > >>>>>>>>>>>>>>>>>>>>>> table created this way often cannot be > >>>>>>>>> persisted. In > >>>>>>>>>>> the > >>>>>>>>>>>>>>>>> foreseeable > >>>>>>>>>>>>>>>>>>>>>> future, such catalog will only be > >>>>>> HiveCatalog, > >>>>>>>>> and > >>>>>>>>>>> not > >>>>>>>>>>>>>>> everyone > >>>>>>>>>>>>>>>>> has a Hive > >>>>>>>>>>>>>>>>>>>>>> metastore. To be honest, without > >>>>>> persistence, > >>>>>>>>>>> recreating > >>>>>>>>>>>>>>> tables > >>>>>>>>>>>>>>>>> every time > >>>>>>>>>>>>>>>>>>>>>> this way is still a lot of keyboard > >>>>> typing. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>> Bowen > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 8:07 PM Kurt > >>>>> Young > >>>>>> < > >>>>>>>>>>>>>> ykt...@gmail.com > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> If a specific connector want to have > >>>>> such > >>>>>>>>>>> parameter and > >>>>>>>>>>>>>>> read > >>>>>>>>>>>>>>>>> if out of > >>>>>>>>>>>>>>>>>>>>>>> configuration, then that's fine. > >>>>>>>>>>>>>>>>>>>>>>> If we are talking about a configuration > >>>>>> for all > >>>>>>>>>>> kinds of > >>>>>>>>>>>>>>>>> sources, I would > >>>>>>>>>>>>>>>>>>>>>>> be super careful about that. > >>>>>>>>>>>>>>>>>>>>>>> It's true it can solve maybe 80% cases, > >>>>>> but it > >>>>>>>>>>> will also > >>>>>>>>>>>>>>> make > >>>>>>>>>>>>>>>>> the left 20% > >>>>>>>>>>>>>>>>>>>>>>> feels weird. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 11, 2020 at 11:00 AM Jark > >>>>> Wu > >>>>>> < > >>>>>>>>>>>>>> imj...@gmail.com > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Hi Kurt, > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> #3 Regarding to global offset: > >>>>>>>>>>>>>>>>>>>>>>>> I'm not saying to use the global > >>>>>>>>> configuration to > >>>>>>>>>>>>>>> override > >>>>>>>>>>>>>>>>> connector > >>>>>>>>>>>>>>>>>>>>>>>> properties by the planner. > >>>>>>>>>>>>>>>>>>>>>>>> But the connector should take this > >>>>>>>>> configuration > >>>>>>>>>>> and > >>>>>>>>>>>>>>>>> translate into their > >>>>>>>>>>>>>>>>>>>>>>>> client API. > >>>>>>>>>>>>>>>>>>>>>>>> AFAIK, almost all the message queues > >>>>>> support > >>>>>>>>>>> eariliest > >>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> latest and a > >>>>>>>>>>>>>>>>>>>>>>>> timestamp value as start point. > >>>>>>>>>>>>>>>>>>>>>>>> So we can support 3 options for this > >>>>>>>>>>> configuration: > >>>>>>>>>>>>>>>>> "eariliest", "latest" > >>>>>>>>>>>>>>>>>>>>>>>> and a timestamp string value. > >>>>>>>>>>>>>>>>>>>>>>>> Of course, this can't solve 100% > >>>>>> cases, but I > >>>>>>>>>>> guess can > >>>>>>>>>>>>>>>>> sovle 80% or 90% > >>>>>>>>>>>>>>>>>>>>>>>> cases. > >>>>>>>>>>>>>>>>>>>>>>>> And the remaining cases can be > >>>>>> resolved by > >>>>>>>>> LIKE > >>>>>>>>>>> syntax > >>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>> I guess is > >>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>> very common cases. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> On Wed, 11 Mar 2020 at 10:33, Kurt > >>>>>> Young < > >>>>>>>>>>>>>>> ykt...@gmail.com > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Good to have such lovely > >>>>>> discussions. I > >>>>>>>>> also > >>>>>>>>>>> want to > >>>>>>>>>>>>>>>> share > >>>>>>>>>>>>>>>>> some of my > >>>>>>>>>>>>>>>>>>>>>>>>> opinions. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> #1 Regarding to error handling: I > >>>>>> also > >>>>>>>>> think > >>>>>>>>>>> ignore > >>>>>>>>>>>>>>>>> invalid hints would > >>>>>>>>>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>>>>>> dangerous, maybe > >>>>>>>>>>>>>>>>>>>>>>>>> the simplest solution is just throw > >>>>>> an > >>>>>>>>>>> exception. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> #2 Regarding to property > >>>>>> replacement: I > >>>>>>>>> don't > >>>>>>>>>>> think > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>>>>>>> constraint > >>>>>>>>>>>>>>>>>>>>>>>>> ourself to > >>>>>>>>>>>>>>>>>>>>>>>>> the meaning of the word "hint", and > >>>>>>>>> forbidden > >>>>>>>>>>> it > >>>>>>>>>>>>>>>> modifying > >>>>>>>>>>>>>>>>> any > >>>>>>>>>>>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>>>>>>>>>> which can effect > >>>>>>>>>>>>>>>>>>>>>>>>> query results. IMO `PROPERTIES` is > >>>>>> one of > >>>>>>>>> the > >>>>>>>>>>> table > >>>>>>>>>>>>>>>> hints, > >>>>>>>>>>>>>>>>> and a > >>>>>>>>>>>>>>>>>>>>>>> powerful > >>>>>>>>>>>>>>>>>>>>>>>>> one. It can > >>>>>>>>>>>>>>>>>>>>>>>>> modify properties located in DDL's > >>>>>> WITH > >>>>>>>>> block. > >>>>>>>>>>> But I > >>>>>>>>>>>>>>> also > >>>>>>>>>>>>>>>>> see the harm > >>>>>>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>>>> if we make it > >>>>>>>>>>>>>>>>>>>>>>>>> too flexible like change the kafka > >>>>>> topic > >>>>>>>>> name > >>>>>>>>>>> with a > >>>>>>>>>>>>>>>> hint. > >>>>>>>>>>>>>>>>> Such use > >>>>>>>>>>>>>>>>>>>>>>> case > >>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>> not common and > >>>>>>>>>>>>>>>>>>>>>>>>> sounds very dangerous to me. I > >>>>> would > >>>>>>>>> propose > >>>>>>>>>>> we have > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>> map > >>>>>>>>>>>>>>>>> of hintable > >>>>>>>>>>>>>>>>>>>>>>>>> properties for each > >>>>>>>>>>>>>>>>>>>>>>>>> connector, and should validate all > >>>>>> passed > >>>>>>>>> in > >>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>> are actually > >>>>>>>>>>>>>>>>>>>>>>>>> hintable. And combining with > >>>>>>>>>>>>>>>>>>>>>>>>> #1 error handling, we can throw an > >>>>>>>>> exception > >>>>>>>>>>> once > >>>>>>>>>>>>>>>> received > >>>>>>>>>>>>>>>>> invalid > >>>>>>>>>>>>>>>>>>>>>>>>> property. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> #3 Regarding to global offset: I'm > >>>>>> not sure > >>>>>>>>>>> it's > >>>>>>>>>>>>>>>> feasible. > >>>>>>>>>>>>>>>>> Different > >>>>>>>>>>>>>>>>>>>>>>>>> connectors will have totally > >>>>>>>>>>>>>>>>>>>>>>>>> different properties to represent > >>>>>> offset, > >>>>>>>>> some > >>>>>>>>>>> might > >>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>> timestamps, > >>>>>>>>>>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>>>>>>>>>> might be string literals > >>>>>>>>>>>>>>>>>>>>>>>>> like "earliest", and others might > >>>>> be > >>>>>> just > >>>>>>>>>>> integers. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 11:46 PM > >>>>>> Jark Wu < > >>>>>>>>>>>>>>>> imj...@gmail.com> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> I want to jump in the discussion > >>>>>> about > >>>>>>>>> the > >>>>>>>>>>> "dynamic > >>>>>>>>>>>>>>>>> start offset" > >>>>>>>>>>>>>>>>>>>>>>>>> problem. > >>>>>>>>>>>>>>>>>>>>>>>>>> First of all, I share the same > >>>>>> concern > >>>>>>>>> with > >>>>>>>>>>> Timo > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> Fabian, that the > >>>>>>>>>>>>>>>>>>>>>>>>>> "start offset" affects the query > >>>>>>>>> semantics, > >>>>>>>>>>> i.e. > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> query result. > >>>>>>>>>>>>>>>>>>>>>>>>>> But "hints" is just used for > >>>>>> optimization > >>>>>>>>>>> which > >>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>> affect the > >>>>>>>>>>>>>>>>>>>>>>>> result? > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> I think the "dynamic start > >>>>> offset" > >>>>>> is an > >>>>>>>>> very > >>>>>>>>>>>>>>> important > >>>>>>>>>>>>>>>>> usability > >>>>>>>>>>>>>>>>>>>>>>>> problem > >>>>>>>>>>>>>>>>>>>>>>>>>> which will be faced by many > >>>>>> streaming > >>>>>>>>>>> platforms. > >>>>>>>>>>>>>>>>>>>>>>>>>> I also agree "CREATE TEMPORARY > >>>>>> TABLE Temp > >>>>>>>>>>> (LIKE t) > >>>>>>>>>>>>>>> WITH > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>> ('connector.startup-timestamp-millis' = > >>>>>>>>>>>>>>>>> '1578538374471')" is verbose, > >>>>>>>>>>>>>>>>>>>>>>>>> what > >>>>>>>>>>>>>>>>>>>>>>>>>> if we have 10 tables to join? > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> However, what I want to propose > >>>>>> (should > >>>>>>>>> be > >>>>>>>>>>> another > >>>>>>>>>>>>>>>>> thread) is a > >>>>>>>>>>>>>>>>>>>>>>> global > >>>>>>>>>>>>>>>>>>>>>>>>>> configuration to reset start > >>>>>> offsets of > >>>>>>>>> all > >>>>>>>>>>> the > >>>>>>>>>>>>>>> source > >>>>>>>>>>>>>>>>> connectors > >>>>>>>>>>>>>>>>>>>>>>>>>> in the query session, e.g. > >>>>>>>>>>>>>>>> "table.sources.start-offset". > >>>>>>>>>>>>>>>>> This is > >>>>>>>>>>>>>>>>>>>>>>>> possible > >>>>>>>>>>>>>>>>>>>>>>>>>> now because > >>>>>> `TableSourceFactory.Context` > >>>>>>>>> has > >>>>>>>>>>>>>>>>> `getConfiguration` > >>>>>>>>>>>>>>>>>>>>>>>>>> method to get the session > >>>>>> configuration, > >>>>>>>>> and > >>>>>>>>>>> use it > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> create an > >>>>>>>>>>>>>>>>>>>>>>>> adapted > >>>>>>>>>>>>>>>>>>>>>>>>>> TableSource. > >>>>>>>>>>>>>>>>>>>>>>>>>> Then we can also expose to SQL > >>>>> CLI > >>>>>> via > >>>>>>>>> SET > >>>>>>>>>>> command, > >>>>>>>>>>>>>>>> e.g. > >>>>>>>>>>>>>>>>> `SET > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>> 'table.sources.start-offset'='earliest';`, > >>>>>>>>>>> which is > >>>>>>>>>>>>>>>>> pretty simple and > >>>>>>>>>>>>>>>>>>>>>>>>>> straightforward. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> This is very similar to KSQL's > >>>>> `SET > >>>>>>>>>>>>>>>>> 'auto.offset.reset'='earliest'` > >>>>>>>>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>>>>>>>>> is very helpful IMO. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at 22:29, > >>>>> Timo > >>>>>>>>> Walther < > >>>>>>>>>>>>>>>>> twal...@apache.org> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> compared to the hints, FLIP-110 > >>>>>> is > >>>>>>>>> fully > >>>>>>>>>>>>>> compliant > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> the SQL > >>>>>>>>>>>>>>>>>>>>>>>> standard. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think that `CREATE > >>>>>> TEMPORARY > >>>>>>>>> TABLE > >>>>>>>>>>> Temp > >>>>>>>>>>>>>>> (LIKE > >>>>>>>>>>>>>>>>> t) WITH > >>>>>>>>>>>>>>>>>>>>>>> (k=v)` > >>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>> too verbose or awkward for the > >>>>>> power of > >>>>>>>>>>> basically > >>>>>>>>>>>>>>>>> changing the > >>>>>>>>>>>>>>>>>>>>>>> entire > >>>>>>>>>>>>>>>>>>>>>>>>>>> connector. Usually, this > >>>>>> statement > >>>>>>>>> would > >>>>>>>>>>> just > >>>>>>>>>>>>>>> precede > >>>>>>>>>>>>>>>>> the query in > >>>>>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>>>>> multiline file. So it can be > >>>>>> change > >>>>>>>>>>> "in-place" > >>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>> the hints you > >>>>>>>>>>>>>>>>>>>>>>>>>> proposed. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Many companies have a > >>>>>> well-defined set > >>>>>>>>> of > >>>>>>>>>>> tables > >>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>> should be > >>>>>>>>>>>>>>>>>>>>>>> used. > >>>>>>>>>>>>>>>>>>>>>>>>> It > >>>>>>>>>>>>>>>>>>>>>>>>>>> would be dangerous if users can > >>>>>> change > >>>>>>>>> the > >>>>>>>>>>> path > >>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>> topic in a hint. > >>>>>>>>>>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>>>>>>>>>> catalog/catalog manager should > >>>>>> be the > >>>>>>>>>>> entity that > >>>>>>>>>>>>>>>>> controls which > >>>>>>>>>>>>>>>>>>>>>>>> tables > >>>>>>>>>>>>>>>>>>>>>>>>>>> exist and how they can be > >>>>>> accessed. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> what’s the problem there if > >>>>> we > >>>>>> user > >>>>>>>>> the > >>>>>>>>>>> table > >>>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>> to support > >>>>>>>>>>>>>>>>>>>>>>>>> “start > >>>>>>>>>>>>>>>>>>>>>>>>>>> offset”? > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> IMHO it violates the meaning of > >>>>>> a hint. > >>>>>>>>>>> According > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>> dictionary, > >>>>>>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>>>>> hint is "a statement that > >>>>>> expresses > >>>>>>>>>>> indirectly > >>>>>>>>>>>>>> what > >>>>>>>>>>>>>>>>> one prefers not > >>>>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>> say explicitly". But offsets > >>>>> are > >>>>>> a > >>>>>>>>>>> property that > >>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>> very explicit. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> If we go with the hint > >>>>> approach, > >>>>>> it > >>>>>>>>> should > >>>>>>>>>>> be > >>>>>>>>>>>>>>>>> expressible in the > >>>>>>>>>>>>>>>>>>>>>>>>>>> TableSourceFactory which > >>>>>> properties are > >>>>>>>>>>> supported > >>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>> hinting. Or > >>>>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>>>>>>>>>>>> plan to offer those hints in a > >>>>>> separate > >>>>>>>>>>>>>> Map<String, > >>>>>>>>>>>>>>>>> String> that > >>>>>>>>>>>>>>>>>>>>>>>> cannot > >>>>>>>>>>>>>>>>>>>>>>>>>>> overwrite existing properties? > >>>>> I > >>>>>> think > >>>>>>>>>>> this would > >>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>>>>>>>>> story... > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On 10.03.20 10:34, Danny Chan > >>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Timo ~ > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Personally I would say that > >>>>>> offset > > >>>>>>>>> 0 > >>>>>>>>>>> and > >>>>>>>>>>>>>> start > >>>>>>>>>>>>>>>>> offset = 10 does > >>>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>>> have the same semantic, so from > >>>>>> the SQL > >>>>>>>>>>> aspect, > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>> implement > >>>>>>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>>>>> “starting offset” hint for > >>>>> query > >>>>>> with > >>>>>>>>> such > >>>>>>>>>>> a > >>>>>>>>>>>>>>> syntax. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> And the CREATE TABLE LIKE > >>>>>> syntax is a > >>>>>>>>>>> DDL which > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>> just verbose > >>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>> defining such dynamic > >>>>> parameters > >>>>>> even > >>>>>>>>> if > >>>>>>>>>>> it could > >>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>> that, shall we > >>>>>>>>>>>>>>>>>>>>>>>>> force > >>>>>>>>>>>>>>>>>>>>>>>>>>> users to define a temporal > >>>>> table > >>>>>> for > >>>>>>>>> each > >>>>>>>>>>> query > >>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>> dynamic > >>>>>>>>>>>>>>>>>>>>>>> params, > >>>>>>>>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>>>>> would say it’s an awkward > >>>>>> solution. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> "Hints should give "hints" > >>>>> but > >>>>>> not > >>>>>>>>>>> affect the > >>>>>>>>>>>>>>>> actual > >>>>>>>>>>>>>>>>> produced > >>>>>>>>>>>>>>>>>>>>>>>>> result.” > >>>>>>>>>>>>>>>>>>>>>>>>>>> You mentioned that multiple > >>>>>> times and > >>>>>>>>>>> could we > >>>>>>>>>>>>>>> give a > >>>>>>>>>>>>>>>>> reason, > >>>>>>>>>>>>>>>>>>>>>>> what’s > >>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>> problem there if we user the > >>>>>> table > >>>>>>>>> hints to > >>>>>>>>>>>>>> support > >>>>>>>>>>>>>>>>> “start offset” > >>>>>>>>>>>>>>>>>>>>>>> ? > >>>>>>>>>>>>>>>>>>>>>>>>> From > >>>>>>>>>>>>>>>>>>>>>>>>>>> my side I saw some benefits for > >>>>>> that: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> • It’s very convent to set up > >>>>>> these > >>>>>>>>>>> parameters, > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> syntax is > >>>>>>>>>>>>>>>>>>>>>>> very > >>>>>>>>>>>>>>>>>>>>>>>>> much > >>>>>>>>>>>>>>>>>>>>>>>>>>> like the DDL definition > >>>>>>>>>>>>>>>>>>>>>>>>>>>> • It’s scope is very clear, > >>>>>> right on > >>>>>>>>> the > >>>>>>>>>>> table > >>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>> attathed > >>>>>>>>>>>>>>>>>>>>>>>>>>>> • It does not affect the > >>>>> table > >>>>>>>>> schema, > >>>>>>>>>>> which > >>>>>>>>>>>>>>> means > >>>>>>>>>>>>>>>>> in order to > >>>>>>>>>>>>>>>>>>>>>>>>> specify > >>>>>>>>>>>>>>>>>>>>>>>>>>> the offset, there is no need to > >>>>>> define > >>>>>>>>> an > >>>>>>>>>>> offset > >>>>>>>>>>>>>>>>> column which is > >>>>>>>>>>>>>>>>>>>>>>>> weird > >>>>>>>>>>>>>>>>>>>>>>>>>>> actually, offset should never > >>>>> be > >>>>>> a > >>>>>>>>> column, > >>>>>>>>>>> it’s > >>>>>>>>>>>>>>> more > >>>>>>>>>>>>>>>>> like a > >>>>>>>>>>>>>>>>>>>>>>> metadata > >>>>>>>>>>>>>>>>>>>>>>>>> or a > >>>>>>>>>>>>>>>>>>>>>>>>>>> start option. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> So in total, FLIP-110 uses > >>>>> the > >>>>>> offset > >>>>>>>>>>> more > >>>>>>>>>>>>>> like a > >>>>>>>>>>>>>>>>> Hive partition > >>>>>>>>>>>>>>>>>>>>>>>>> prune, > >>>>>>>>>>>>>>>>>>>>>>>>>>> we can do that if we have an > >>>>>> offset > >>>>>>>>>>> column, but > >>>>>>>>>>>>>>> most > >>>>>>>>>>>>>>>>> of the case we > >>>>>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>>> define that, so there is > >>>>>> actually no > >>>>>>>>>>> conflict or > >>>>>>>>>>>>>>>>> overlap. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800 > >>>>> PM4:28,Timo > >>>>>>>>> Walther < > >>>>>>>>>>>>>>>>> twal...@apache.org>,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldn't FLIP-110[1] solve > >>>>>> most > >>>>>>>>> of the > >>>>>>>>>>>>>>> problems > >>>>>>>>>>>>>>>>> we have around > >>>>>>>>>>>>>>>>>>>>>>>>>> defining > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table properties more > >>>>>> dynamically > >>>>>>>>>>> without > >>>>>>>>>>>>>>> manual > >>>>>>>>>>>>>>>>> schema work? > >>>>>>>>>>>>>>>>>>>>>>> Also > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> offset definition is easier > >>>>>> with > >>>>>>>>> such a > >>>>>>>>>>>>>> syntax. > >>>>>>>>>>>>>>>>> They must not be > >>>>>>>>>>>>>>>>>>>>>>>>>> defined > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in catalog but could be > >>>>>> temporary > >>>>>>>>>>> tables that > >>>>>>>>>>>>>>>>> extend from the > >>>>>>>>>>>>>>>>>>>>>>>>> original > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, we should aim > >>>>> to > >>>>>> keep > >>>>>>>>> the > >>>>>>>>>>> syntax > >>>>>>>>>>>>>>>>> concise and don't > >>>>>>>>>>>>>>>>>>>>>>>>> provide > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> too many ways of doing the > >>>>>> same > >>>>>>>>> thing. > >>>>>>>>>>> Hints > >>>>>>>>>>>>>>>>> should give "hints" > >>>>>>>>>>>>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> affect the actual produced > >>>>>> result. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Some connector properties > >>>>>> might > >>>>>>>>> also > >>>>>>>>>>> change > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> plan or schema > >>>>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> future. E.g. they might > >>>>> also > >>>>>> define > >>>>>>>>>>> whether a > >>>>>>>>>>>>>>>>> table source > >>>>>>>>>>>>>>>>>>>>>>>> supports > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> certain push-downs (e.g. > >>>>>> predicate > >>>>>>>>>>>>>> push-down). > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dawid is currently working > >>>>> a > >>>>>> draft > >>>>>>>>>>> that might > >>>>>>>>>>>>>>>>> makes it possible > >>>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> expose a Kafka offset via > >>>>> the > >>>>>>>>> schema > >>>>>>>>>>> such > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>> `SELECT * FROM > >>>>>>>>>>>>>>>>>>>>>>>> Topic > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> WHERE offset > 10` would > >>>>>> become > >>>>>>>>>>> possible and > >>>>>>>>>>>>>>>> could > >>>>>>>>>>>>>>>>> be pushed > >>>>>>>>>>>>>>>>>>>>>>> down. > >>>>>>>>>>>>>>>>>>>>>>>>> But > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this is of course, not > >>>>>> planned > >>>>>>>>>>> initially. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10.03.20 08:34, Danny > >>>>> Chan > >>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Wenlong ~ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint Error > >>>>>>>>> handling > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually we have no way > >>>>> to > >>>>>>>>> figure out > >>>>>>>>>>>>>>> whether a > >>>>>>>>>>>>>>>>> error prone > >>>>>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>>>>> is a > >>>>>>>>>>>>>>>>>>>>>>>>>>> PROPERTIES hint, for example, > >>>>> if > >>>>>> use > >>>>>>>>>>> writes a > >>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>>>>>>>> ‘PROPERTIAS’, > >>>>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>>>>>> not know if this hint is a > >>>>>> PROPERTIES > >>>>>>>>>>> hint, what > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>> know is that > >>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>>>>>>> name was not registered in our > >>>>>> Flink. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the user writes the > >>>>>> hint name > >>>>>>>>>>> correctly > >>>>>>>>>>>>>>>> (i.e. > >>>>>>>>>>>>>>>>> PROPERTIES), > >>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>> did > >>>>>>>>>>>>>>>>>>>>>>>>>>> can enforce the validation of > >>>>>> the hint > >>>>>>>>>>> options > >>>>>>>>>>>>>>> though > >>>>>>>>>>>>>>>>> the pluggable > >>>>>>>>>>>>>>>>>>>>>>>>>>> HintOptionChecker. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint > >>>>> Option > >>>>>> Format > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For a key value style > >>>>> hint > >>>>>>>>> option, > >>>>>>>>>>> the key > >>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>> be either a > >>>>>>>>>>>>>>>>>>>>>>> simple > >>>>>>>>>>>>>>>>>>>>>>>>>>> identifier or a string literal, > >>>>>> which > >>>>>>>>>>> means that > >>>>>>>>>>>>>>> it’s > >>>>>>>>>>>>>>>>> compatible > >>>>>>>>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>>>>>>>>>>> DDL syntax. We support simple > >>>>>>>>> identifier > >>>>>>>>>>> because > >>>>>>>>>>>>>>> many > >>>>>>>>>>>>>>>>> other hints > >>>>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>>> have the component complex keys > >>>>>> like > >>>>>>>>> the > >>>>>>>>>>> table > >>>>>>>>>>>>>>>>> properties, and we > >>>>>>>>>>>>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>> unify the parse block. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800 > >>>>>>>>>>> PM3:19,wenlong.lwl < > >>>>>>>>>>>>>>>>> wenlong88....@gmail.com > >>>>>>>>>>>>>>>>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, thanks for > >>>>> the > >>>>>>>>> proposal. > >>>>>>>>>>> +1 for > >>>>>>>>>>>>>>>>> adding table hints, > >>>>>>>>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>> really > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a necessary feature for > >>>>>> flink > >>>>>>>>> sql > >>>>>>>>>>> to > >>>>>>>>>>>>>>>> integrate > >>>>>>>>>>>>>>>>> with a catalog. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For error handling, I > >>>>>> think it > >>>>>>>>>>> would be > >>>>>>>>>>>>>>> more > >>>>>>>>>>>>>>>>> natural to throw > >>>>>>>>>>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception when error > >>>>>> table hint > >>>>>>>>>>> provided, > >>>>>>>>>>>>>>>>> because the > >>>>>>>>>>>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be merged and used > >>>>>> to find > >>>>>>>>>>> the table > >>>>>>>>>>>>>>>>> factory which would > >>>>>>>>>>>>>>>>>>>>>>>>> cause > >>>>>>>>>>>>>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception when error > >>>>>> properties > >>>>>>>>>>> provided, > >>>>>>>>>>>>>>>>> right? On the other > >>>>>>>>>>>>>>>>>>>>>>>>> hand, > >>>>>>>>>>>>>>>>>>>>>>>>>>> unlike > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other hints which just > >>>>>> affect > >>>>>>>>> the > >>>>>>>>>>> way to > >>>>>>>>>>>>>>>>> execute the query, > >>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>> property > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table hint actually > >>>>>> affects the > >>>>>>>>>>> result of > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> query, we should > >>>>>>>>>>>>>>>>>>>>>>>>> never > >>>>>>>>>>>>>>>>>>>>>>>>>>> ignore > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the given property > >>>>> hints. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the format of > >>>>>> property > >>>>>>>>> hints, > >>>>>>>>>>>>>>> currently, > >>>>>>>>>>>>>>>>> in sql client, we > >>>>>>>>>>>>>>>>>>>>>>>>>> accept > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properties in format of > >>>>>> string > >>>>>>>>>>> only in > >>>>>>>>>>>>>> DDL: > >>>>>>>>>>>>>>>>>>>>>>>>>> 'connector.type'='kafka', > >>>>>>>>>>>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> think the format of > >>>>>> properties > >>>>>>>>> in > >>>>>>>>>>> hint > >>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>> be the same as > >>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>> format we > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defined in ddl. What do > >>>>>> you > >>>>>>>>> think? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bests, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wenlong Lyu > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at > >>>>>> 14:22, > >>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Weike: About the > >>>>>> Error > >>>>>>>>> Handing > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To be consistent with > >>>>>> other > >>>>>>>>> SQL > >>>>>>>>>>>>>> vendors, > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> default is to > >>>>>>>>>>>>>>>>>>>>>>> log > >>>>>>>>>>>>>>>>>>>>>>>>>>> warnings > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and if there is any > >>>>>> error > >>>>>>>>>>> (invalid hint > >>>>>>>>>>>>>>>> name > >>>>>>>>>>>>>>>>> or options), the > >>>>>>>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>>>>>>> is just > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignored. I have > >>>>> already > >>>>>>>>>>> addressed in > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> wiki. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Timo: About the > >>>>>> PROPERTIES > >>>>>>>>>>> Table > >>>>>>>>>>>>>> Hint > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • The properties > >>>>> hints > >>>>>> is > >>>>>>>>> also > >>>>>>>>>>>>>> optional, > >>>>>>>>>>>>>>>>> user can pass in an > >>>>>>>>>>>>>>>>>>>>>>>>> option > >>>>>>>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> override the table > >>>>>> properties > >>>>>>>>>>> but this > >>>>>>>>>>>>>>> does > >>>>>>>>>>>>>>>>> not mean it is > >>>>>>>>>>>>>>>>>>>>>>>>>> required. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • They should not > >>>>>> include > >>>>>>>>>>> semantics: > >>>>>>>>>>>>>> does > >>>>>>>>>>>>>>>>> the properties > >>>>>>>>>>>>>>>>>>>>>>> belong > >>>>>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantic ? I don't > >>>>>> think so, > >>>>>>>>> the > >>>>>>>>>>> plan > >>>>>>>>>>>>>>> does > >>>>>>>>>>>>>>>>> not change right ? > >>>>>>>>>>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>>>>>>>>>> result > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set may be affected, > >>>>>> but > >>>>>>>>> there > >>>>>>>>>>> are > >>>>>>>>>>>>>>> already > >>>>>>>>>>>>>>>>> some hints do so, > >>>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>> example, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MS-SQL MAXRECURSION > >>>>> and > >>>>>>>>> SNAPSHOT > >>>>>>>>>>> hint > >>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • `SELECT * FROM > >>>>> t(k=v, > >>>>>>>>> k=v)`: > >>>>>>>>>>> this > >>>>>>>>>>>>>>> grammar > >>>>>>>>>>>>>>>>> breaks the SQL > >>>>>>>>>>>>>>>>>>>>>>>>> standard > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compared to the hints > >>>>>>>>> way(which > >>>>>>>>>>> is > >>>>>>>>>>>>>>> included > >>>>>>>>>>>>>>>>> in comments) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • I actually didn't > >>>>>> found any > >>>>>>>>>>> vendors > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> support such > >>>>>>>>>>>>>>>>>>>>>>> grammar, > >>>>>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is no way to override > >>>>>> table > >>>>>>>>> level > >>>>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>> dynamically. For > >>>>>>>>>>>>>>>>>>>>>>>>>> normal > >>>>>>>>>>>>>>>>>>>>>>>>>>> RDBMS, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there are no > >>>>>> requests > >>>>>>>>>>> for such > >>>>>>>>>>>>>>>>> dynamic parameters > >>>>>>>>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>>>>>>>>>> all the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table have the same > >>>>>> storage > >>>>>>>>> and > >>>>>>>>>>>>>>> computation > >>>>>>>>>>>>>>>>> and they are > >>>>>>>>>>>>>>>>>>>>>>> almost > >>>>>>>>>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>>>>>>>>>> batch > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> • While Flink as a > >>>>>>>>> computation > >>>>>>>>>>> engine > >>>>>>>>>>>>>> has > >>>>>>>>>>>>>>>>> many connectors, > >>>>>>>>>>>>>>>>>>>>>>>>>>> especially for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some message queue > >>>>> like > >>>>>>>>> Kafka, > >>>>>>>>>>> we would > >>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>> a start_offset > >>>>>>>>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different each time > >>>>> we > >>>>>> start > >>>>>>>>> the > >>>>>>>>>>> query, > >>>>>>>>>>>>>>>> such > >>>>>>>>>>>>>>>>> parameters can > >>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> persisted to catalog, > >>>>>> because > >>>>>>>>>>> it’s not > >>>>>>>>>>>>>>>>> static, this is > >>>>>>>>>>>>>>>>>>>>>>> actually > >>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> background we propose > >>>>>> the > >>>>>>>>> table > >>>>>>>>>>> hints > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> indicate such > >>>>>>>>>>>>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamically. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Jark and Jinsong: > >>>>> I > >>>>>> have > >>>>>>>>>>> removed the > >>>>>>>>>>>>>>>>> query hints part and > >>>>>>>>>>>>>>>>>>>>>>>>> change > >>>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> title. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>>> > >>> > >> > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 +0800 > >>>>>> PM5:46,Timo > >>>>>>>>>>> Walther < > >>>>>>>>>>>>>>>>> twal...@apache.org > >>>>>>>>>>>>>>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for the > >>>>>> proposal. I > >>>>>>>>>>> agree with > >>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>> and Jingsong. > >>>>>>>>>>>>>>>>>>>>>>>> Planner > >>>>>>>>>>>>>>>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and table hints are > >>>>>>>>> orthogonal > >>>>>>>>>>> topics > >>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>> should be > >>>>>>>>>>>>>>>>>>>>>>> discussed > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I share Jingsong's > >>>>>> opinion > >>>>>>>>>>> that we > >>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>> not use planner > >>>>>>>>>>>>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> passing connector > >>>>>>>>> properties. > >>>>>>>>>>> Planner > >>>>>>>>>>>>>>>>> hints should be > >>>>>>>>>>>>>>>>>>>>>>> optional > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>>> any > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time. They should > >>>>> not > >>>>>>>>> include > >>>>>>>>>>>>>> semantics > >>>>>>>>>>>>>>>>> but only affect > >>>>>>>>>>>>>>>>>>>>>>>>> execution > >>>>>>>>>>>>>>>>>>>>>>>>>>> time. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Connector > >>>>> properties > >>>>>> are an > >>>>>>>>>>> important > >>>>>>>>>>>>>>>> part > >>>>>>>>>>>>>>>>> of the query > >>>>>>>>>>>>>>>>>>>>>>>> itself. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Have you thought > >>>>>> about > >>>>>>>>> options > >>>>>>>>>>> such > >>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>> `SELECT * FROM t(k=v, > >>>>>>>>>>>>>>>>>>>>>>>>>> k=v)`? > >>>>>>>>>>>>>>>>>>>>>>>>>>> How > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are other vendors > >>>>>> deal with > >>>>>>>>>>> this > >>>>>>>>>>>>>>> problem? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 09.03.20 10:37, > >>>>>>>>> Jingsong Li > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, +1 for > >>>>>> table > >>>>>>>>> hints, > >>>>>>>>>>>>>> thanks > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>> driving. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I took a look to > >>>>>> FLIP, > >>>>>>>>> most > >>>>>>>>>>> of > >>>>>>>>>>>>>>> content > >>>>>>>>>>>>>>>>> are talking about > >>>>>>>>>>>>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>>>>>>>>>>>> hints. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It is > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard to > >>>>> discussion > >>>>>> and > >>>>>>>>>>> voting. So > >>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> split it as Jark > >>>>>>>>>>>>>>>>>>>>>>>> said. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another thing is > >>>>>>>>>>> configuration that > >>>>>>>>>>>>>>>>> suitable to config with > >>>>>>>>>>>>>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "connector.path" > >>>>>> and > >>>>>>>>>>>>>>> "connector.topic", > >>>>>>>>>>>>>>>>> Are they really > >>>>>>>>>>>>>>>>>>>>>>>>> suitable > >>>>>>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints? Looks > >>>>> weird > >>>>>> to me. > >>>>>>>>>>> Because I > >>>>>>>>>>>>>>>>> think these properties > >>>>>>>>>>>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 9, > >>>>>> 2020 at > >>>>>>>>> 5:30 > >>>>>>>>>>> PM Jark > >>>>>>>>>>>>>>> Wu > >>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>> imj...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Danny > >>>>> for > >>>>>>>>> starting > >>>>>>>>>>> the > >>>>>>>>>>>>>>>>> discussion. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> +1 for this > >>>>>> feature. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we just > >>>>> focus > >>>>>> on the > >>>>>>>>>>> table > >>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>> not the query hints in > >>>>>>>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> release, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could you split > >>>>>> the > >>>>>>>>> FLIP > >>>>>>>>>>> into two > >>>>>>>>>>>>>>>>> FLIPs? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because it's > >>>>>> hard to > >>>>>>>>> vote > >>>>>>>>>>> on > >>>>>>>>>>>>>>> partial > >>>>>>>>>>>>>>>>> part of a FLIP. You > >>>>>>>>>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>>>>>>> keep > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the table > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints proposal > >>>>> in > >>>>>>>>> FLIP-113 > >>>>>>>>>>> and > >>>>>>>>>>>>>> move > >>>>>>>>>>>>>>>>> query hints into > >>>>>>>>>>>>>>>>>>>>>>> another > >>>>>>>>>>>>>>>>>>>>>>>>>> FLIP. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So that we can > >>>>>> focuse > >>>>>>>>> on > >>>>>>>>>>> the > >>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>>> hints in the FLIP. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, 9 Mar > >>>>>> 2020 at > >>>>>>>>>>> 17:14, > >>>>>>>>>>>>>> DONG, > >>>>>>>>>>>>>>>>> Weike < > >>>>>>>>>>>>>>>>>>>>>>>>>> kyled...@connect.hku.hk > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a > >>>>> nice > >>>>>>>>> feature, > >>>>>>>>>>> +1. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I > >>>>> am > >>>>>>>>>>> interested in > >>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>> mentioned in the > >>>>>>>>>>>>>>>>>>>>>>>>> proposal > >>>>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> handling, as > >>>>>> it is > >>>>>>>>> quite > >>>>>>>>>>> common > >>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>> users to write > >>>>>>>>>>>>>>>>>>>>>>>>>> inappropriate > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints in > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL code, if > >>>>>> illegal > >>>>>>>>> or > >>>>>>>>>>> "bad" > >>>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>> are given, would the > >>>>>>>>>>>>>>>>>>>>>>>>> system > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore them > >>>>> or > >>>>>> throw > >>>>>>>>>>>>>> exceptions? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks : ) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Weike > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar > >>>>> 9, > >>>>>> 2020 > >>>>>>>>> at > >>>>>>>>>>> 5:02 PM > >>>>>>>>>>>>>>>> Danny > >>>>>>>>>>>>>>>>> Chan < > >>>>>>>>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Note: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we only > >>>>> plan > >>>>>> to > >>>>>>>>>>> support table > >>>>>>>>>>>>>>>>> hints in Flink release > >>>>>>>>>>>>>>>>>>>>>>> 1.11, > >>>>>>>>>>>>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> focus > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly on > >>>>>> the table > >>>>>>>>>>> hints > >>>>>>>>>>>>>> part > >>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> just ignore the > >>>>>>>>>>>>>>>>>>>>>>> planner > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints, sorry > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that > >>>>> mistake > >>>>>> ~ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 > >>>>>> +0800 > >>>>>>>>>>>>>> PM4:36,Danny > >>>>>>>>>>>>>>>>> Chan < > >>>>>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com > >>>>>>>>>>>>>>>>>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>> fellows ~ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would > >>>>>> like to > >>>>>>>>>>> propose the > >>>>>>>>>>>>>>>>> supports for SQL hints for > >>>>>>>>>>>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would > >>>>>> support > >>>>>>>>>>> hints > >>>>>>>>>>>>>> syntax > >>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>> following: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> select > >>>>> /*+ > >>>>>>>>>>> NO_HASH_JOIN, > >>>>>>>>>>>>>>>>> RESOURCE(mem='128mb', > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallelism='24') */ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> emp /*+ > >>>>>>>>> INDEX(idx1, > >>>>>>>>>>> idx2) > >>>>>>>>>>>>>> */ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dept /*+ > >>>>>>>>>>>>>> PROPERTIES(k1='v1', > >>>>>>>>>>>>>>>>> k2='v2') */ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>> emp.deptno > >>>>>> = > >>>>>>>>>>> dept.deptno > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Basically > >>>>>> we > >>>>>>>>> would > >>>>>>>>>>> support > >>>>>>>>>>>>>>> both > >>>>>>>>>>>>>>>>> query hints(after the > >>>>>>>>>>>>>>>>>>>>>>>>> SELECT > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keyword) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and table > >>>>>>>>> hints(after > >>>>>>>>>>> the > >>>>>>>>>>>>>>>>> referenced table name), for > >>>>>>>>>>>>>>>>>>>>>>>> 1.11, > >>>>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> plan to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support > >>>>>> table hints > >>>>>>>>>>> with a > >>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>> probably named > >>>>>>>>>>>>>>>>>>>>>>> PROPERTIES: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>> table_name > >>>>>> /*+ > >>>>>>>>>>>>>>>>> PROPERTIES(k1='v1', k2='v2') *+/ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am > >>>>>> looking > >>>>>>>>> forward > >>>>>>>>>>> to > >>>>>>>>>>>>>> your > >>>>>>>>>>>>>>>>> comments. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can > >>>>>> access > >>>>>>>>> the > >>>>>>>>>>> FLIP > >>>>>>>>>>>>>> here: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny > >>>>> Chan