Have one question for adding `supportedHintOptions` method to `TableFactory`. It seems `TableFactory` is a base factory interface for all *table module* related instances, such as catalog, module, format and so on. It's not created only for *table*. Is it possible to move it to `TableSourceFactory`?
Best, Kurt On Wed, Mar 18, 2020 at 10:59 AM Danny Chan <yuzhao....@gmail.com> wrote: > Thanks Timo ~ > > For the naming itself, I also think the PROPERTIES is not that concise, so > +1 for OPTIONS (I had thought about that, but there are many codes in > current Flink called it properties, i.e. the DescriptorProperties, > #getSupportedProperties), let’s use OPTIONS if this is our new preference. > > +1 to `Set<ConfigOption> supportedHintOptions()` because the ConfigOption > can take more info. AFAIK, Spark also call their table options instead of > properties. [1] > > In my local POC, I did create a new CatalogTable, and it works for current > connectors well, all the DDL tables would finally yield a CatalogTable > instance and we can apply the options to that(in the CatalogSourceTable > when we generating the TableSource), the pros is that we do not need to > modify the codes of connectors itself. If we split the options from > CatalogTable, we may need to add some additional logic in each connector > factories in order to merge these properties (and the logic are almost the > same), what do you think about this? > > [1] > https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html > > Best, > Danny Chan > 在 2020年3月17日 +0800 PM10:10,Timo Walther <twal...@apache.org>,写道: > > Hi Danny, > > > > thanks for updating the FLIP. I think your current design is sufficient > > to separate hints from result-related properties. > > > > One remark to the naming itself: I would vote for calling the hints > > around table scan `OPTIONS('k'='v')`. We used the term "properties" in > > the past but since we want to unify the Flink configuration experience, > > we should use consistent naming and classes around `ConfigOptions`. > > > > It would be nice to use `Set<ConfigOption> supportedHintOptions();` to > > start using config options instead of pure string properties. This will > > also allow us to generate documentation in the future around supported > > data types, ranges, etc. for options. At some point we would also like > > to drop `DescriptorProperties` class. "Options" is also used in the > > documentation [1] and in the SQL/MED standard [2]. > > > > Furthermore, I would still vote for separating CatalogTable and hint > > options. Otherwise the planner would need to create a new CatalogTable > > instance which might not always be easy. We should offer them via: > > > > org.apache.flink.table.factories.TableSourceFactory.Context#getHints: > > ReadableConfig > > > > What do you think? > > > > Regards, > > Timo > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table > > [2] https://wiki.postgresql.org/wiki/SQL/MED > > > > > > On 12.03.20 15:06, Stephan Ewen wrote: > > > @Danny sounds good. > > > > > > Maybe it is worth listing all the classes of problems that you want to > > > address and then look at each class and see if hints are a good default > > > solution or a good optional way of simplifying things? > > > The discussion has grown a lot and it is starting to be hard to > distinguish > > > the parts where everyone agrees from the parts were there are concerns. > > > > > > On Thu, Mar 12, 2020 at 2:31 PM Danny Chan <danny0...@apache.org> > wrote: > > > > > > > Thanks Stephan ~ > > > > > > > > We can remove the support for properties that may change the > semantics of > > > > query if you think that is a trouble. > > > > > > > > How about we support the /*+ properties() */ hint only for those > optimize > > > > parameters, such as the fetch size of source or something like that, > does > > > > that make sense? > > > > > > > > Stephan Ewen <se...@apache.org>于2020年3月12日 周四下午7:45写道: > > > > > > > > > I think Bowen has actually put it very well. > > > > > > > > > > (1) Hints that change semantics looks like trouble waiting to > happen. For > > > > > example Kafka offset handling should be in filters. The Kafka > source > > > > should > > > > > support predicate pushdown. > > > > > > > > > > (2) Hints should not be a workaround for current shortcomings. A > lot of > > > > the > > > > > suggested above sounds exactly like that. Working around > catalog/DDL > > > > > shortcomings, missing exposure of metadata (offsets), missing > predicate > > > > > pushdown in Kafka. Abusing a feature like hints now as a quick fix > for > > > > > these issues, rather than fixing the root causes, will much likely > bite > > > > us > > > > > back badly in the future. > > > > > > > > > > Best, > > > > > Stephan > > > > > > > > > > > > > > > On Thu, Mar 12, 2020 at 10:43 AM Kurt Young <ykt...@gmail.com> > wrote: > > > > > > > > > > > It seems this FLIP's name is somewhat misleading. From my > > > > understanding, > > > > > > this FLIP is trying to > > > > > > address the dynamic parameter issue, and table hints is the way > we wan > > > > to > > > > > > choose. I think we should > > > > > > be focus on "what's the right way to solve dynamic property" > instead of > > > > > > discussing "whether table > > > > > > hints can affect query semantics". > > > > > > > > > > > > For now, there are two proposed ways to achieve dynamic property: > > > > > > 1. FLIP-110: create temporary table xx like xx with (xxx) > > > > > > 2. use custom "from t with (xxx)" syntax > > > > > > 3. "Borrow" the table hints to have a special PROPERTIES hint. > > > > > > > > > > > > The first one didn't break anything, but the only problem i see > is a > > > > > little > > > > > > more verbose than the table hint > > > > > > approach. I can imagine when someone using SQL CLI to have a sql > > > > > > experience, it's quite often that > > > > > > he will modify the table property, some use cases i can think of: > > > > > > 1. the source contains some corrupted data, i want to turn on the > > > > > > "ignore-error" flag for certain formats. > > > > > > 2. I have a kafka table and want to see some sample data from the > > > > > > beginning, so i change the offset > > > > > > to "earliest", and then I want to observe the latest data which > keeps > > > > > > coming in. I would write another query > > > > > > to select from the latest table. > > > > > > 3. I want to my jdbc sink flush data more eagerly then i can > observe > > > > the > > > > > > data from database side. > > > > > > > > > > > > Most of such use cases are quite ad-hoc. If every time I want to > have a > > > > > > different experience, i need to create > > > > > > a temporary table and then also modify my query, it doesn't feel > > > > smooth. > > > > > > Embed such dynamic property into > > > > > > query would have better user experience. > > > > > > > > > > > > Both 2 & 3 can make this happen. The cons of #2 is breaking SQL > > > > > compliant, > > > > > > and for #3, it only breaks some > > > > > > unwritten rules, but we can have an explanation on that. And I > really > > > > > doubt > > > > > > whether user would complain about > > > > > > this when they actually have flexible and good experience using > this. > > > > > > > > > > > > My tendency would be #3 > #1 > #2, what do you think? > > > > > > > > > > > > Best, > > > > > > Kurt > > > > > > > > > > > > > > > > > > On Thu, Mar 12, 2020 at 1:11 PM Danny Chan <yuzhao....@gmail.com > > > > > > wrote: > > > > > > > > > > > > > Thanks Aljoscha ~ > > > > > > > > > > > > > > I agree for most of the query hints, they are optional as an > > > > optimizer > > > > > > > instruction, especially for the traditional RDBMS. > > > > > > > > > > > > > > But, just like BenChao said, Flink as a computation engine has > many > > > > > > > different kind of data sources, thus, dynamic parameters like > > > > > > start_offest > > > > > > > can only bind to each table scope, we can not set a session > config > > > > like > > > > > > > KSQL because they are all about Kafka: > > > > > > > > SET ‘auto.offset.reset’=‘earliest’; > > > > > > > > > > > > > > Thus the most flexible way to set up these dynamic params is > to bind > > > > to > > > > > > > the table scope in the query when we want to override > something, so > > > > we > > > > > > have > > > > > > > these solutions above (with pros and cons from my side): > > > > > > > > > > > > > > • 1. Select * from t(offset=123) (from Timo) > > > > > > > > > > > > > > Pros: > > > > > > > - Easy to add > > > > > > > - Parameters are part of the main query > > > > > > > Cons: > > > > > > > - Not SQL compliant > > > > > > > > > > > > > > > > > > > > > • 2. Select * from t /*+ PROPERTIES(offset=123) */ (from me) > > > > > > > > > > > > > > Pros: > > > > > > > - Easy to add > > > > > > > - SQL compliant because it is nested in the comments > > > > > > > > > > > > > > Cons: > > > > > > > - Parameters are not part of the main query > > > > > > > - Cryptic syntax for new users > > > > > > > > > > > > > > The biggest problem for hints way may be the “if hints must be > > > > > optional”, > > > > > > > actually we have though about 1 for a while but aborted > because it > > > > > breaks > > > > > > > the SQL standard too much. And we replace it with 2, because > the > > > > hints > > > > > > > syntax do not break SQL standard(nested in comments). > > > > > > > > > > > > > > What if we have the special /*+ PROPERTIES */ hint that allows > > > > override > > > > > > > some properties of table dynamically, it does not break > anything, at > > > > > > lease > > > > > > > for current Flink use cases. > > > > > > > > > > > > > > Planner hints are optional just because they are naturally > enforcers > > > > of > > > > > > > the planner, most of them aim to instruct the optimizer, but, > the > > > > table > > > > > > > hints is a little different, table hints can specify the table > meta > > > > > like > > > > > > > index column, and it is very convenient to specify table > properties. > > > > > > > > > > > > > > Or shall we not call /*+ PROPERTIES(offset=123) */ table hint, > we > > > > can > > > > > > > call it table dynamic parameters. > > > > > > > > > > > > > > Best, > > > > > > > Danny Chan > > > > > > > 在 2020年3月11日 +0800 PM9:20,Aljoscha Krettek < > aljos...@apache.org>,写道: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I don't understand this discussion. Hints, as I understand > them, > > > > > should > > > > > > > > work like this: > > > > > > > > > > > > > > > > - hints are *optional* advice for the optimizer to try and > help it > > > > to > > > > > > > > find a good execution strategy > > > > > > > > - hints should not change query semantics, i.e. they should > not > > > > > change > > > > > > > > connector properties executing a query with taking into > account the > > > > > > > > hints *must* produce the same result as executing the query > without > > > > > > > > taking into account the hints > > > > > > > > > > > > > > > > From these simple requirements you can derive a solution > that makes > > > > > > > > sense. I don't have a strong preference for the syntax but we > > > > should > > > > > > > > strive to be in line with prior work. > > > > > > > > > > > > > > > > Best, > > > > > > > > Aljoscha > > > > > > > > > > > > > > > > On 11.03.20 11:53, Danny Chan wrote: > > > > > > > > > Thanks Timo for summarize the 3 options ~ > > > > > > > > > > > > > > > > > > I agree with Kurt that option2 is too complicated to use > because: > > > > > > > > > > > > > > > > > > • As a Kafka topic consumer, the user must define both the > > > > virtual > > > > > > > column for start offset and he must apply a special filter > predicate > > > > > > after > > > > > > > each query > > > > > > > > > • And for the internal implementation, the metadata column > push > > > > > down > > > > > > > is another hard topic, each kind of message queue may have its > offset > > > > > > > attribute, we need to consider the expression type for > different > > > > kind; > > > > > > the > > > > > > > source also need to recognize the constant column as a config > > > > > > option(which > > > > > > > is weird because usually what we pushed down is a table column) > > > > > > > > > > > > > > > > > > For option 1 and option3, I think there is no difference, > option1 > > > > > is > > > > > > > also a hint syntax which is introduced in Sybase and > referenced then > > > > > > > deprecated by MS-SQL in 199X years because of the > ambitiousness. > > > > > > Personally > > > > > > > I prefer /*+ */ style table hint than WITH keyword for these > reasons: > > > > > > > > > > > > > > > > > > • We do not break the standard SQL, the hints are nested > in SQL > > > > > > > comments > > > > > > > > > • We do not need to introduce additional WITH keyword > which may > > > > > > appear > > > > > > > in a query if we use that because a table can be referenced in > all > > > > > kinds > > > > > > of > > > > > > > SQL contexts: INSERT/DELETE/FROM/JOIN …. That would make our > sql > > > > query > > > > > > > break too much of the SQL from standard > > > > > > > > > • We would have uniform syntax for hints as query hint, one > > > > syntax > > > > > > > fits all and more easy to use > > > > > > > > > > > > > > > > > > > > > > > > > > > And here is the reason why we choose a uniform Oracle > style query > > > > > > > hint syntax which is addressed by Julian Hyde when we design > the > > > > syntax > > > > > > > from the Calcite community: > > > > > > > > > > > > > > > > > > I don’t much like the MSSQL-style syntax for table hints. > It > > > > adds a > > > > > > > new use of the WITH keyword that is unrelated to the use of > WITH for > > > > > > > common-table expressions. > > > > > > > > > > > > > > > > > > A historical note. Microsoft SQL Server inherited its hint > syntax > > > > > > from > > > > > > > Sybase a very long time ago. (See “Transact SQL > Programming”[1], page > > > > > > 632, > > > > > > > “Optimizer hints”. The book was written in 1999, and covers > Microsoft > > > > > SQL > > > > > > > Server 6.5 / 7.0 and Sybase Adaptive Server 11.5, but the > syntax very > > > > > > > likely predates Sybase 4.3, from which Microsoft SQL Server was > > > > forked > > > > > in > > > > > > > 1993.) > > > > > > > > > > > > > > > > > > Microsoft later added the WITH keyword to make it less > ambiguous, > > > > > and > > > > > > > has now deprecated the syntax that does not use WITH. > > > > > > > > > > > > > > > > > > They are forced to keep the syntax for backwards > compatibility > > > > but > > > > > > > that doesn’t mean that we should shoulder their burden. > > > > > > > > > > > > > > > > > > I think formatted comments are the right container for > hints > > > > > because > > > > > > > it allows us to change the hint syntax without changing the SQL > > > > parser, > > > > > > and > > > > > > > makes clear that we are at liberty to ignore hints entirely. > > > > > > > > > > > > > > > > > > Julian > > > > > > > > > > > > > > > > > > [1] https://www.amazon.com/s?k=9781565924017 < > > > > > > > https://www.amazon.com/s?k=9781565924017> > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Danny Chan > > > > > > > > > 在 2020年3月11日 +0800 PM4:03,Timo Walther <twal...@apache.org > >,写道: > > > > > > > > > > Hi Danny, > > > > > > > > > > > > > > > > > > > > it is true that our DDL is not standard compliant by > using the > > > > > WITH > > > > > > > > > > clause. Nevertheless, we aim for not diverging too much > and the > > > > > > LIKE > > > > > > > > > > clause is an example of that. It will solve things like > > > > > overwriting > > > > > > > > > > WATERMARKs, add additional/modifying properties and > inherit > > > > > schema. > > > > > > > > > > > > > > > > > > > > Bowen is right that Flink's DDL is mixing 3 types > definition > > > > > > > together. > > > > > > > > > > We are not the first ones that try to solve this. There > is also > > > > > the > > > > > > > SQL > > > > > > > > > > MED standard [1] that tried to tackle this problem. I > think it > > > > > was > > > > > > > not > > > > > > > > > > considered when designing the current DDL. > > > > > > > > > > > > > > > > > > > > Currently, I see 3 options for handling Kafka offsets. I > will > > > > > give > > > > > > > some > > > > > > > > > > examples and look forward to feedback here: > > > > > > > > > > > > > > > > > > > > *Option 1* Runtime and semantic parms as part of the > query > > > > > > > > > > > > > > > > > > > > `SELECT * FROM MyTable('offset'=123)` > > > > > > > > > > > > > > > > > > > > Pros: > > > > > > > > > > - Easy to add > > > > > > > > > > - Parameters are part of the main query > > > > > > > > > > - No complicated hinting syntax > > > > > > > > > > > > > > > > > > > > Cons: > > > > > > > > > > - Not SQL compliant > > > > > > > > > > > > > > > > > > > > *Option 2* Use metadata in query > > > > > > > > > > > > > > > > > > > > `CREATE TABLE MyTable (id INT, offset AS > > > > > > SYSTEM_METADATA('offset'))` > > > > > > > > > > > > > > > > > > > > `SELECT * FROM MyTable WHERE offset > TIMESTAMP > '2012-12-12 > > > > > > > 12:34:22'` > > > > > > > > > > > > > > > > > > > > Pros: > > > > > > > > > > - SQL compliant in the query > > > > > > > > > > - Access of metadata in the DDL which is required anyway > > > > > > > > > > - Regular pushdown rules apply > > > > > > > > > > > > > > > > > > > > Cons: > > > > > > > > > > - Users need to add an additional comlumn in the DDL > > > > > > > > > > > > > > > > > > > > *Option 3*: Use hints for properties > > > > > > > > > > > > > > > > > > > > ` > > > > > > > > > > SELECT * > > > > > > > > > > FROM MyTable /*+ PROPERTIES('offset'=123) */ > > > > > > > > > > ` > > > > > > > > > > > > > > > > > > > > Pros: > > > > > > > > > > - Easy to add > > > > > > > > > > > > > > > > > > > > Cons: > > > > > > > > > > - Parameters are not part of the main query > > > > > > > > > > - Cryptic syntax for new users > > > > > > > > > > - Not standard compliant. > > > > > > > > > > > > > > > > > > > > If we go with this option, I would suggest to make it > available > > > > > in > > > > > > a > > > > > > > > > > separate map and don't mix it with statically defined > > > > properties. > > > > > > > Such > > > > > > > > > > that the factory can decide which properties have the > right to > > > > be > > > > > > > > > > overwritten by the hints: > > > > > > > > > > TableSourceFactory.Context.getQueryHints(): > ReadableConfig > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Timo > > > > > > > > > > > > > > > > > > > > [1] https://en.wikipedia.org/wiki/SQL/MED > > > > > > > > > > > > > > > > > > > > Currently I see 3 options as a > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11.03.20 07:21, Danny Chan wrote: > > > > > > > > > > > Thanks Bowen ~ > > > > > > > > > > > > > > > > > > > > > > I agree we should somehow categorize our connector > > > > parameters. > > > > > > > > > > > > > > > > > > > > > > For type1, I’m already preparing a solution like the > > > > Confluent > > > > > > > schema registry + Avro schema inference thing, so this may not > be a > > > > > > problem > > > > > > > in the near future. > > > > > > > > > > > > > > > > > > > > > > For type3, I have some questions: > > > > > > > > > > > > > > > > > > > > > > > "SELECT * FROM mykafka WHERE offset > 12pm yesterday” > > > > > > > > > > > > > > > > > > > > > > Where does the offset column come from, a virtual > column from > > > > > the > > > > > > > table schema, you said that > > > > > > > > > > > > > > > > > > > > > > > They change > > > > > > > > > > > almost every time a query starts and have nothing to > do with > > > > > > > metadata, thus > > > > > > > > > > > should not be part of table definition/DDL > > > > > > > > > > > > > > > > > > > > > > But why you can reference it in the query, I’m > confused for > > > > > that, > > > > > > > can you elaborate a little ? > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Danny Chan > > > > > > > > > > > 在 2020年3月11日 +0800 PM12:52,Bowen Li < > bowenl...@gmail.com > > > > > ,写道: > > > > > > > > > > > > Thanks Danny for kicking off the effort > > > > > > > > > > > > > > > > > > > > > > > > The root cause of too much manual work is Flink DDL > has > > > > > mixed 3 > > > > > > > types of > > > > > > > > > > > > params together and doesn't handle each of them very > well. > > > > > > Below > > > > > > > are how I > > > > > > > > > > > > categorize them and corresponding solutions in my > mind: > > > > > > > > > > > > > > > > > > > > > > > > - type 1: Metadata of external data, like external > > > > > > endpoint/url, > > > > > > > > > > > > username/pwd, schemas, formats. > > > > > > > > > > > > > > > > > > > > > > > > Such metadata are mostly already accessible in > external > > > > > system > > > > > > > as long as > > > > > > > > > > > > endpoints and credentials are provided. Flink can > get it > > > > thru > > > > > > > catalogs, but > > > > > > > > > > > > we haven't had many catalogs yet and thus Flink just > hasn't > > > > > > been > > > > > > > able to > > > > > > > > > > > > leverage that. So the solution should be building > more > > > > > > catalogs. > > > > > > > Such > > > > > > > > > > > > params should be part of a Flink table > DDL/definition, and > > > > > not > > > > > > > overridable > > > > > > > > > > > > in any means. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - type 2: Runtime params, like jdbc connector's > fetch size, > > > > > > > elasticsearch > > > > > > > > > > > > connector's bulk flush size. > > > > > > > > > > > > > > > > > > > > > > > > Such params don't affect query results, but affect > how > > > > > results > > > > > > > are produced > > > > > > > > > > > > (eg. fast or slow, aka performance) - they are > essentially > > > > > > > execution and > > > > > > > > > > > > implementation details. They change often in > exploration or > > > > > > > development > > > > > > > > > > > > stages, but not quite frequently in well-defined > > > > long-running > > > > > > > pipelines. > > > > > > > > > > > > They should always have default values and can be > missing > > > > in > > > > > > > query. They > > > > > > > > > > > > can be part of a table DDL/definition, but should > also be > > > > > > > replaceable in a > > > > > > > > > > > > query - *this is what table "hints" in FLIP-113 > should > > > > > cover*. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - type 3: Semantic params, like kafka connector's > start > > > > > offset. > > > > > > > > > > > > > > > > > > > > > > > > Such params affect query results - the semantics. > They'd > > > > > better > > > > > > > be as > > > > > > > > > > > > filter conditions in WHERE clause that can be pushed > down. > > > > > They > > > > > > > change > > > > > > > > > > > > almost every time a query starts and have nothing to > do > > > > with > > > > > > > metadata, thus > > > > > > > > > > > > should not be part of table definition/DDL, nor be > > > > persisted > > > > > in > > > > > > > catalogs. > > > > > > > > > > > > If they will, users should create views to keep such > params > > > > > > > around (note > > > > > > > > > > > > this is different from variable substitution). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Take Flink-Kafka as an example. Once we get these > params > > > > > right, > > > > > > > here're the > > > > > > > > > > > > steps users need to do to develop and run a Flink > job: > > > > > > > > > > > > - configure a Flink ConfluentSchemaRegistry with url, > > > > > username, > > > > > > > and password > > > > > > > > > > > > - run "SELECT * FROM mykafka WHERE offset > 12pm > yesterday" > > > > > > > (simplified > > > > > > > > > > > > timestamp) in SQL CLI, Flink automatically retrieves > all > > > > > > > metadata of > > > > > > > > > > > > schema, file format, etc and start the job > > > > > > > > > > > > - users want to make the job read Kafka topic > faster, so it > > > > > > goes > > > > > > > as "SELECT > > > > > > > > > > > > * FROM mykafka /* faster_read_key=value*/ WHERE > offset > > > > > 12pm > > > > > > > yesterday" > > > > > > > > > > > > - done and satisfied, users submit it to production > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding "CREATE TABLE t LIKE with (k1=v1, k2=v2), > I think > > > > > > it's > > > > > > > a > > > > > > > > > > > > nice-to-have feature, but not a strategically > critical, > > > > > > > long-term solution, > > > > > > > > > > > > because > > > > > > > > > > > > 1) It may seem promising at the current stage to > solve the > > > > > > > > > > > > too-much-manual-work problem, but that's only > because Flink > > > > > > > hasn't > > > > > > > > > > > > leveraged catalogs well and handled the 3 types of > params > > > > > above > > > > > > > properly. > > > > > > > > > > > > Once we get the params types right, the LIKE syntax > won't > > > > be > > > > > > that > > > > > > > > > > > > important, and will be just an easier way to create > tables > > > > > > > without retyping > > > > > > > > > > > > long fields like username and pwd. > > > > > > > > > > > > 2) Note that only some rare type of catalog can > store k-v > > > > > > > property pair, so > > > > > > > > > > > > table created this way often cannot be persisted. In > the > > > > > > > foreseeable > > > > > > > > > > > > future, such catalog will only be HiveCatalog, and > not > > > > > everyone > > > > > > > has a Hive > > > > > > > > > > > > metastore. To be honest, without persistence, > recreating > > > > > tables > > > > > > > every time > > > > > > > > > > > > this way is still a lot of keyboard typing. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Bowen > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 10, 2020 at 8:07 PM Kurt Young < > > > > ykt...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > If a specific connector want to have such > parameter and > > > > > read > > > > > > > if out of > > > > > > > > > > > > > configuration, then that's fine. > > > > > > > > > > > > > If we are talking about a configuration for all > kinds of > > > > > > > sources, I would > > > > > > > > > > > > > be super careful about that. > > > > > > > > > > > > > It's true it can solve maybe 80% cases, but it > will also > > > > > make > > > > > > > the left 20% > > > > > > > > > > > > > feels weird. > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > Kurt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 11, 2020 at 11:00 AM Jark Wu < > > > > imj...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Kurt, > > > > > > > > > > > > > > > > > > > > > > > > > > > > #3 Regarding to global offset: > > > > > > > > > > > > > > I'm not saying to use the global configuration to > > > > > override > > > > > > > connector > > > > > > > > > > > > > > properties by the planner. > > > > > > > > > > > > > > But the connector should take this configuration > and > > > > > > > translate into their > > > > > > > > > > > > > > client API. > > > > > > > > > > > > > > AFAIK, almost all the message queues support > eariliest > > > > > and > > > > > > > latest and a > > > > > > > > > > > > > > timestamp value as start point. > > > > > > > > > > > > > > So we can support 3 options for this > configuration: > > > > > > > "eariliest", "latest" > > > > > > > > > > > > > > and a timestamp string value. > > > > > > > > > > > > > > Of course, this can't solve 100% cases, but I > guess can > > > > > > > sovle 80% or 90% > > > > > > > > > > > > > > cases. > > > > > > > > > > > > > > And the remaining cases can be resolved by LIKE > syntax > > > > > > which > > > > > > > I guess is > > > > > > > > > > > > > not > > > > > > > > > > > > > > very common cases. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, 11 Mar 2020 at 10:33, Kurt Young < > > > > > ykt...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Good to have such lovely discussions. I also > want to > > > > > > share > > > > > > > some of my > > > > > > > > > > > > > > > opinions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > #1 Regarding to error handling: I also think > ignore > > > > > > > invalid hints would > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > dangerous, maybe > > > > > > > > > > > > > > > the simplest solution is just throw an > exception. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > #2 Regarding to property replacement: I don't > think > > > > we > > > > > > > should > > > > > > > > > > > > > constraint > > > > > > > > > > > > > > > ourself to > > > > > > > > > > > > > > > the meaning of the word "hint", and forbidden > it > > > > > > modifying > > > > > > > any > > > > > > > > > > > > > properties > > > > > > > > > > > > > > > which can effect > > > > > > > > > > > > > > > query results. IMO `PROPERTIES` is one of the > table > > > > > > hints, > > > > > > > and a > > > > > > > > > > > > > powerful > > > > > > > > > > > > > > > one. It can > > > > > > > > > > > > > > > modify properties located in DDL's WITH block. > But I > > > > > also > > > > > > > see the harm > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > if we make it > > > > > > > > > > > > > > > too flexible like change the kafka topic name > with a > > > > > > hint. > > > > > > > Such use > > > > > > > > > > > > > case > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > not common and > > > > > > > > > > > > > > > sounds very dangerous to me. I would propose > we have > > > > a > > > > > > map > > > > > > > of hintable > > > > > > > > > > > > > > > properties for each > > > > > > > > > > > > > > > connector, and should validate all passed in > > > > properties > > > > > > > are actually > > > > > > > > > > > > > > > hintable. And combining with > > > > > > > > > > > > > > > #1 error handling, we can throw an exception > once > > > > > > received > > > > > > > invalid > > > > > > > > > > > > > > > property. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > #3 Regarding to global offset: I'm not sure > it's > > > > > > feasible. > > > > > > > Different > > > > > > > > > > > > > > > connectors will have totally > > > > > > > > > > > > > > > different properties to represent offset, some > might > > > > be > > > > > > > timestamps, > > > > > > > > > > > > > some > > > > > > > > > > > > > > > might be string literals > > > > > > > > > > > > > > > like "earliest", and others might be just > integers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > Kurt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 10, 2020 at 11:46 PM Jark Wu < > > > > > > imj...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I want to jump in the discussion about the > "dynamic > > > > > > > start offset" > > > > > > > > > > > > > > > problem. > > > > > > > > > > > > > > > > First of all, I share the same concern with > Timo > > > > and > > > > > > > Fabian, that the > > > > > > > > > > > > > > > > "start offset" affects the query semantics, > i.e. > > > > the > > > > > > > query result. > > > > > > > > > > > > > > > > But "hints" is just used for optimization > which > > > > > should > > > > > > > affect the > > > > > > > > > > > > > > result? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think the "dynamic start offset" is an very > > > > > important > > > > > > > usability > > > > > > > > > > > > > > problem > > > > > > > > > > > > > > > > which will be faced by many streaming > platforms. > > > > > > > > > > > > > > > > I also agree "CREATE TEMPORARY TABLE Temp > (LIKE t) > > > > > WITH > > > > > > > > > > > > > > > > ('connector.startup-timestamp-millis' = > > > > > > > '1578538374471')" is verbose, > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > > if we have 10 tables to join? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, what I want to propose (should be > another > > > > > > > thread) is a > > > > > > > > > > > > > global > > > > > > > > > > > > > > > > configuration to reset start offsets of all > the > > > > > source > > > > > > > connectors > > > > > > > > > > > > > > > > in the query session, e.g. > > > > > > "table.sources.start-offset". > > > > > > > This is > > > > > > > > > > > > > > possible > > > > > > > > > > > > > > > > now because `TableSourceFactory.Context` has > > > > > > > `getConfiguration` > > > > > > > > > > > > > > > > method to get the session configuration, and > use it > > > > > to > > > > > > > create an > > > > > > > > > > > > > > adapted > > > > > > > > > > > > > > > > TableSource. > > > > > > > > > > > > > > > > Then we can also expose to SQL CLI via SET > command, > > > > > > e.g. > > > > > > > `SET > > > > > > > > > > > > > > > > 'table.sources.start-offset'='earliest';`, > which is > > > > > > > pretty simple and > > > > > > > > > > > > > > > > straightforward. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is very similar to KSQL's `SET > > > > > > > 'auto.offset.reset'='earliest'` > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > is very helpful IMO. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 10 Mar 2020 at 22:29, Timo Walther < > > > > > > > twal...@apache.org> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Danny, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > compared to the hints, FLIP-110 is fully > > > > compliant > > > > > to > > > > > > > the SQL > > > > > > > > > > > > > > standard. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I don't think that `CREATE TEMPORARY TABLE > Temp > > > > > (LIKE > > > > > > > t) WITH > > > > > > > > > > > > > (k=v)` > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > too verbose or awkward for the power of > basically > > > > > > > changing the > > > > > > > > > > > > > entire > > > > > > > > > > > > > > > > > connector. Usually, this statement would > just > > > > > precede > > > > > > > the query in > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > multiline file. So it can be change > "in-place" > > > > like > > > > > > > the hints you > > > > > > > > > > > > > > > > proposed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Many companies have a well-defined set of > tables > > > > > that > > > > > > > should be > > > > > > > > > > > > > used. > > > > > > > > > > > > > > > It > > > > > > > > > > > > > > > > > would be dangerous if users can change the > path > > > > or > > > > > > > topic in a hint. > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > catalog/catalog manager should be the > entity that > > > > > > > controls which > > > > > > > > > > > > > > tables > > > > > > > > > > > > > > > > > exist and how they can be accessed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > what’s the problem there if we user the > table > > > > > hints > > > > > > > to support > > > > > > > > > > > > > > > “start > > > > > > > > > > > > > > > > > offset”? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > IMHO it violates the meaning of a hint. > According > > > > > to > > > > > > > the > > > > > > > > > > > > > dictionary, > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > hint is "a statement that expresses > indirectly > > > > what > > > > > > > one prefers not > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > say explicitly". But offsets are a > property that > > > > > are > > > > > > > very explicit. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we go with the hint approach, it should > be > > > > > > > expressible in the > > > > > > > > > > > > > > > > > TableSourceFactory which properties are > supported > > > > > for > > > > > > > hinting. Or > > > > > > > > > > > > > do > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > plan to offer those hints in a separate > > > > Map<String, > > > > > > > String> that > > > > > > > > > > > > > > cannot > > > > > > > > > > > > > > > > > overwrite existing properties? I think > this would > > > > > be > > > > > > a > > > > > > > different > > > > > > > > > > > > > > > story... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > Timo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 10.03.20 10:34, Danny Chan wrote: > > > > > > > > > > > > > > > > > > Thanks Timo ~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Personally I would say that offset > 0 > and > > > > start > > > > > > > offset = 10 does > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > have the same semantic, so from the SQL > aspect, > > > > we > > > > > > can > > > > > > > not > > > > > > > > > > > > > implement > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > “starting offset” hint for query with such > a > > > > > syntax. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > And the CREATE TABLE LIKE syntax is a > DDL which > > > > > is > > > > > > > just verbose > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > defining such dynamic parameters even if > it could > > > > > do > > > > > > > that, shall we > > > > > > > > > > > > > > > force > > > > > > > > > > > > > > > > > users to define a temporal table for each > query > > > > > with > > > > > > > dynamic > > > > > > > > > > > > > params, > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > would say it’s an awkward solution. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "Hints should give "hints" but not > affect the > > > > > > actual > > > > > > > produced > > > > > > > > > > > > > > > result.” > > > > > > > > > > > > > > > > > You mentioned that multiple times and > could we > > > > > give a > > > > > > > reason, > > > > > > > > > > > > > what’s > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > problem there if we user the table hints to > > > > support > > > > > > > “start offset” > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > From > > > > > > > > > > > > > > > > > my side I saw some benefits for that: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > • It’s very convent to set up these > parameters, > > > > > the > > > > > > > syntax is > > > > > > > > > > > > > very > > > > > > > > > > > > > > > much > > > > > > > > > > > > > > > > > like the DDL definition > > > > > > > > > > > > > > > > > > • It’s scope is very clear, right on the > table > > > > it > > > > > > > attathed > > > > > > > > > > > > > > > > > > • It does not affect the table schema, > which > > > > > means > > > > > > > in order to > > > > > > > > > > > > > > > specify > > > > > > > > > > > > > > > > > the offset, there is no need to define an > offset > > > > > > > column which is > > > > > > > > > > > > > > weird > > > > > > > > > > > > > > > > > actually, offset should never be a column, > it’s > > > > > more > > > > > > > like a > > > > > > > > > > > > > metadata > > > > > > > > > > > > > > > or a > > > > > > > > > > > > > > > > > start option. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So in total, FLIP-110 uses the offset > more > > > > like a > > > > > > > Hive partition > > > > > > > > > > > > > > > prune, > > > > > > > > > > > > > > > > > we can do that if we have an offset > column, but > > > > > most > > > > > > > of the case we > > > > > > > > > > > > > > do > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > define that, so there is actually no > conflict or > > > > > > > overlap. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > Danny Chan > > > > > > > > > > > > > > > > > > 在 2020年3月10日 +0800 PM4:28,Timo Walther < > > > > > > > twal...@apache.org>,写道: > > > > > > > > > > > > > > > > > > > Hi Danny, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > shouldn't FLIP-110[1] solve most of the > > > > > problems > > > > > > > we have around > > > > > > > > > > > > > > > > defining > > > > > > > > > > > > > > > > > > > table properties more dynamically > without > > > > > manual > > > > > > > schema work? > > > > > > > > > > > > > Also > > > > > > > > > > > > > > > > > > > offset definition is easier with such a > > > > syntax. > > > > > > > They must not be > > > > > > > > > > > > > > > > defined > > > > > > > > > > > > > > > > > > > in catalog but could be temporary > tables that > > > > > > > extend from the > > > > > > > > > > > > > > > original > > > > > > > > > > > > > > > > > > > table. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In general, we should aim to keep the > syntax > > > > > > > concise and don't > > > > > > > > > > > > > > > provide > > > > > > > > > > > > > > > > > > > too many ways of doing the same thing. > Hints > > > > > > > should give "hints" > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > > > affect the actual produced result. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some connector properties might also > change > > > > the > > > > > > > plan or schema > > > > > > > > > > > > > in > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > future. E.g. they might also define > whether a > > > > > > > table source > > > > > > > > > > > > > > supports > > > > > > > > > > > > > > > > > > > certain push-downs (e.g. predicate > > > > push-down). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dawid is currently working a draft > that might > > > > > > > makes it possible > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > expose a Kafka offset via the schema > such > > > > that > > > > > > > `SELECT * FROM > > > > > > > > > > > > > > Topic > > > > > > > > > > > > > > > > > > > WHERE offset > 10` would become > possible and > > > > > > could > > > > > > > be pushed > > > > > > > > > > > > > down. > > > > > > > > > > > > > > > But > > > > > > > > > > > > > > > > > > > this is of course, not planned > initially. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > Timo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 10.03.20 08:34, Danny Chan wrote: > > > > > > > > > > > > > > > > > > > > Thanks Wenlong ~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For PROPERTIES Hint Error handling > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Actually we have no way to figure out > > > > > whether a > > > > > > > error prone > > > > > > > > > > > > > hint > > > > > > > > > > > > > > > is a > > > > > > > > > > > > > > > > > PROPERTIES hint, for example, if use > writes a > > > > hint > > > > > > like > > > > > > > > > > > > > ‘PROPERTIAS’, > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > do > > > > > > > > > > > > > > > > > not know if this hint is a PROPERTIES > hint, what > > > > we > > > > > > > know is that > > > > > > > > > > > > > the > > > > > > > > > > > > > > > hint > > > > > > > > > > > > > > > > > name was not registered in our Flink. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If the user writes the hint name > correctly > > > > > > (i.e. > > > > > > > PROPERTIES), > > > > > > > > > > > > > we > > > > > > > > > > > > > > > did > > > > > > > > > > > > > > > > > can enforce the validation of the hint > options > > > > > though > > > > > > > the pluggable > > > > > > > > > > > > > > > > > HintOptionChecker. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For PROPERTIES Hint Option Format > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For a key value style hint option, > the key > > > > > can > > > > > > > be either a > > > > > > > > > > > > > simple > > > > > > > > > > > > > > > > > identifier or a string literal, which > means that > > > > > it’s > > > > > > > compatible > > > > > > > > > > > > > with > > > > > > > > > > > > > > > our > > > > > > > > > > > > > > > > > DDL syntax. We support simple identifier > because > > > > > many > > > > > > > other hints > > > > > > > > > > > > > do > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > have the component complex keys like the > table > > > > > > > properties, and we > > > > > > > > > > > > > > want > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > unify the parse block. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > Danny Chan > > > > > > > > > > > > > > > > > > > > 在 2020年3月10日 +0800 > PM3:19,wenlong.lwl < > > > > > > > wenlong88....@gmail.com > > > > > > > > > > > > > > > > ,写道: > > > > > > > > > > > > > > > > > > > > > Hi Danny, thanks for the proposal. > +1 for > > > > > > > adding table hints, > > > > > > > > > > > > > it > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > really > > > > > > > > > > > > > > > > > > > > > a necessary feature for flink sql > to > > > > > > integrate > > > > > > > with a catalog. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For error handling, I think it > would be > > > > > more > > > > > > > natural to throw > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > > > > exception when error table hint > provided, > > > > > > > because the > > > > > > > > > > > > > properties > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > hint > > > > > > > > > > > > > > > > > > > > > will be merged and used to find > the table > > > > > > > factory which would > > > > > > > > > > > > > > > cause > > > > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > > > > exception when error properties > provided, > > > > > > > right? On the other > > > > > > > > > > > > > > > hand, > > > > > > > > > > > > > > > > > unlike > > > > > > > > > > > > > > > > > > > > > other hints which just affect the > way to > > > > > > > execute the query, > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > property > > > > > > > > > > > > > > > > > > > > > table hint actually affects the > result of > > > > > the > > > > > > > query, we should > > > > > > > > > > > > > > > never > > > > > > > > > > > > > > > > > ignore > > > > > > > > > > > > > > > > > > > > > the given property hints. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the format of property hints, > > > > > currently, > > > > > > > in sql client, we > > > > > > > > > > > > > > > > accept > > > > > > > > > > > > > > > > > > > > > properties in format of string > only in > > > > DDL: > > > > > > > > > > > > > > > > 'connector.type'='kafka', > > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > > > > > think the format of properties in > hint > > > > > should > > > > > > > be the same as > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > format we > > > > > > > > > > > > > > > > > > > > > defined in ddl. What do you think? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Bests, > > > > > > > > > > > > > > > > > > > > > Wenlong Lyu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 10 Mar 2020 at 14:22, > Danny Chan > > > > < > > > > > > > > > > > > > yuzhao....@gmail.com> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To Weike: About the Error Handing > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To be consistent with other SQL > > > > vendors, > > > > > > the > > > > > > > default is to > > > > > > > > > > > > > log > > > > > > > > > > > > > > > > > warnings > > > > > > > > > > > > > > > > > > > > > > and if there is any error > (invalid hint > > > > > > name > > > > > > > or options), the > > > > > > > > > > > > > > > hint > > > > > > > > > > > > > > > > > is just > > > > > > > > > > > > > > > > > > > > > > ignored. I have already > addressed in > > > > the > > > > > > > wiki. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To Timo: About the PROPERTIES > Table > > > > Hint > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > • The properties hints is also > > > > optional, > > > > > > > user can pass in an > > > > > > > > > > > > > > > option > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > override the table properties > but this > > > > > does > > > > > > > not mean it is > > > > > > > > > > > > > > > > required. > > > > > > > > > > > > > > > > > > > > > > • They should not include > semantics: > > > > does > > > > > > > the properties > > > > > > > > > > > > > belong > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > semantic ? I don't think so, the > plan > > > > > does > > > > > > > not change right ? > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > result > > > > > > > > > > > > > > > > > > > > > > set may be affected, but there > are > > > > > already > > > > > > > some hints do so, > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > example, > > > > > > > > > > > > > > > > > > > > > > MS-SQL MAXRECURSION and SNAPSHOT > hint > > > > [1] > > > > > > > > > > > > > > > > > > > > > > • `SELECT * FROM t(k=v, k=v)`: > this > > > > > grammar > > > > > > > breaks the SQL > > > > > > > > > > > > > > > standard > > > > > > > > > > > > > > > > > > > > > > compared to the hints way(which > is > > > > > included > > > > > > > in comments) > > > > > > > > > > > > > > > > > > > > > > • I actually didn't found any > vendors > > > > to > > > > > > > support such > > > > > > > > > > > > > grammar, > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > there > > > > > > > > > > > > > > > > > > > > > > is no way to override table level > > > > > > properties > > > > > > > dynamically. For > > > > > > > > > > > > > > > > normal > > > > > > > > > > > > > > > > > RDBMS, > > > > > > > > > > > > > > > > > > > > > > I think there are no requests > for such > > > > > > > dynamic parameters > > > > > > > > > > > > > > because > > > > > > > > > > > > > > > > > all the > > > > > > > > > > > > > > > > > > > > > > table have the same storage and > > > > > computation > > > > > > > and they are > > > > > > > > > > > > > almost > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > > > > > tables. > > > > > > > > > > > > > > > > > > > > > > • While Flink as a computation > engine > > > > has > > > > > > > many connectors, > > > > > > > > > > > > > > > > > especially for > > > > > > > > > > > > > > > > > > > > > > some message queue like Kafka, > we would > > > > > > have > > > > > > > a start_offset > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > different each time we start the > query, > > > > > > such > > > > > > > parameters can > > > > > > > > > > > > > not > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > persisted to catalog, because > it’s not > > > > > > > static, this is > > > > > > > > > > > > > actually > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > background we propose the table > hints > > > > to > > > > > > > indicate such > > > > > > > > > > > > > > properties > > > > > > > > > > > > > > > > > > > > > > dynamically. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To Jark and Jinsong: I have > removed the > > > > > > > query hints part and > > > > > > > > > > > > > > > change > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > title. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > > > Danny Chan > > > > > > > > > > > > > > > > > > > > > > 在 2020年3月9日 +0800 PM5:46,Timo > Walther < > > > > > > > twal...@apache.org > > > > > > > > > > > > > > ,写道: > > > > > > > > > > > > > > > > > > > > > > > Hi Danny, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks for the proposal. I > agree with > > > > > > Jark > > > > > > > and Jingsong. > > > > > > > > > > > > > > Planner > > > > > > > > > > > > > > > > > hints > > > > > > > > > > > > > > > > > > > > > > > and table hints are orthogonal > topics > > > > > > that > > > > > > > should be > > > > > > > > > > > > > discussed > > > > > > > > > > > > > > > > > > > > > > separately. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I share Jingsong's opinion > that we > > > > > should > > > > > > > not use planner > > > > > > > > > > > > > > hints > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > > > passing connector properties. > Planner > > > > > > > hints should be > > > > > > > > > > > > > optional > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > any > > > > > > > > > > > > > > > > > > > > > > > time. They should not include > > > > semantics > > > > > > > but only affect > > > > > > > > > > > > > > > execution > > > > > > > > > > > > > > > > > time. > > > > > > > > > > > > > > > > > > > > > > > Connector properties are an > important > > > > > > part > > > > > > > of the query > > > > > > > > > > > > > > itself. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Have you thought about options > such > > > > as > > > > > > > `SELECT * FROM t(k=v, > > > > > > > > > > > > > > > > k=v)`? > > > > > > > > > > > > > > > > > How > > > > > > > > > > > > > > > > > > > > > > > are other vendors deal with > this > > > > > problem? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > Timo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 09.03.20 10:37, Jingsong Li > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi Danny, +1 for table hints, > > > > thanks > > > > > > for > > > > > > > driving. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I took a look to FLIP, most > of > > > > > content > > > > > > > are talking about > > > > > > > > > > > > > > query > > > > > > > > > > > > > > > > > hints. > > > > > > > > > > > > > > > > > > > > > > It is > > > > > > > > > > > > > > > > > > > > > > > > hard to discussion and > voting. So > > > > +1 > > > > > to > > > > > > > split it as Jark > > > > > > > > > > > > > > said. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Another thing is > configuration that > > > > > > > suitable to config with > > > > > > > > > > > > > > > table > > > > > > > > > > > > > > > > > > > > > > hints: > > > > > > > > > > > > > > > > > > > > > > > > "connector.path" and > > > > > "connector.topic", > > > > > > > Are they really > > > > > > > > > > > > > > > suitable > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > > table > > > > > > > > > > > > > > > > > > > > > > > > hints? Looks weird to me. > Because I > > > > > > > think these properties > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > core of > > > > > > > > > > > > > > > > > > > > > > > > table. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > > > > > Jingsong Lee > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 9, 2020 at 5:30 > PM Jark > > > > > Wu > > > > > > < > > > > > > > imj...@gmail.com> > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks Danny for starting > the > > > > > > > discussion. > > > > > > > > > > > > > > > > > > > > > > > > > +1 for this feature. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we just focus on the > table > > > > hints > > > > > > > not the query hints in > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > > > release, > > > > > > > > > > > > > > > > > > > > > > > > > could you split the FLIP > into two > > > > > > > FLIPs? > > > > > > > > > > > > > > > > > > > > > > > > > Because it's hard to vote > on > > > > > partial > > > > > > > part of a FLIP. You > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > keep > > > > > > > > > > > > > > > > > > > > > > the table > > > > > > > > > > > > > > > > > > > > > > > > > hints proposal in FLIP-113 > and > > > > move > > > > > > > query hints into > > > > > > > > > > > > > another > > > > > > > > > > > > > > > > FLIP. > > > > > > > > > > > > > > > > > > > > > > > > > So that we can focuse on > the > > > > table > > > > > > > hints in the FLIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 9 Mar 2020 at > 17:14, > > > > DONG, > > > > > > > Weike < > > > > > > > > > > > > > > > > kyled...@connect.hku.hk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Danny, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is a nice feature, > +1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One thing I am > interested in > > > > but > > > > > > not > > > > > > > mentioned in the > > > > > > > > > > > > > > > proposal > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > > > error > > > > > > > > > > > > > > > > > > > > > > > > > > handling, as it is quite > common > > > > > for > > > > > > > users to write > > > > > > > > > > > > > > > > inappropriate > > > > > > > > > > > > > > > > > > > > > > hints in > > > > > > > > > > > > > > > > > > > > > > > > > > SQL code, if illegal or > "bad" > > > > > hints > > > > > > > are given, would the > > > > > > > > > > > > > > > system > > > > > > > > > > > > > > > > > > > > > > simply > > > > > > > > > > > > > > > > > > > > > > > > > > ignore them or throw > > > > exceptions? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks : ) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > > > > > > > Weike > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 9, 2020 at > 5:02 PM > > > > > > Danny > > > > > > > Chan < > > > > > > > > > > > > > > > > yuzhao....@gmail.com> > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Note: > > > > > > > > > > > > > > > > > > > > > > > > > > > we only plan to > support table > > > > > > > hints in Flink release > > > > > > > > > > > > > 1.11, > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > > > > > > > please > > > > > > > > > > > > > > > > > > > > > > > > > > focus > > > > > > > > > > > > > > > > > > > > > > > > > > > mainly on the table > hints > > > > part > > > > > > and > > > > > > > just ignore the > > > > > > > > > > > > > planner > > > > > > > > > > > > > > > > > > > > > > hints, sorry > > > > > > > > > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > > > > > > > that mistake ~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > > > > > > > > Danny Chan > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020年3月9日 +0800 > > > > PM4:36,Danny > > > > > > > Chan < > > > > > > > > > > > > > yuzhao....@gmail.com > > > > > > > > > > > > > > > > > ,写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, fellows ~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would like to > propose the > > > > > > > supports for SQL hints for > > > > > > > > > > > > > > our > > > > > > > > > > > > > > > > > > > > > > Flink SQL. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We would support > hints > > > > syntax > > > > > > as > > > > > > > following: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > select /*+ > NO_HASH_JOIN, > > > > > > > RESOURCE(mem='128mb', > > > > > > > > > > > > > > > > > > > > > > parallelism='24') */ > > > > > > > > > > > > > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > > > > > > > > > > > > emp /*+ INDEX(idx1, > idx2) > > > > */ > > > > > > > > > > > > > > > > > > > > > > > > > > > > join > > > > > > > > > > > > > > > > > > > > > > > > > > > > dept /*+ > > > > PROPERTIES(k1='v1', > > > > > > > k2='v2') */ > > > > > > > > > > > > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > > > > > > > > > > > emp.deptno = > dept.deptno > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Basically we would > support > > > > > both > > > > > > > query hints(after the > > > > > > > > > > > > > > > SELECT > > > > > > > > > > > > > > > > > > > > > > keyword) > > > > > > > > > > > > > > > > > > > > > > > > > > > and table hints(after > the > > > > > > > referenced table name), for > > > > > > > > > > > > > > 1.11, > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > > > plan to > > > > > > > > > > > > > > > > > > > > > > > > > > only > > > > > > > > > > > > > > > > > > > > > > > > > > > support table hints > with a > > > > hint > > > > > > > probably named > > > > > > > > > > > > > PROPERTIES: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > table_name /*+ > > > > > > > PROPERTIES(k1='v1', k2='v2') *+/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am looking forward > to > > > > your > > > > > > > comments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You can access the > FLIP > > > > here: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Danny Chan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >