A quick summary that focus of the discussion now shifts to be whether semantic params like kafka 'starting offset' should be table hints/properties, and if so, in what form.
I strongly believe the action of setting offset should *not* be part of a table, neither hints nor properties, for all the good reasons mentioned above. It doesn't describe the physical table/storage. It's indeed a filter describing what kind of data users want to query on top of all present, physical data. One way to think of that is 'offset' is a system column of Kafka, and Flink queries can filter on this system column. E.g. Postgres has a column 'xmin' to track commit timestamp automatically and users can filter on it [1]. Flink connectors should be able to define it's own system columns, recognize them from filters pushed down by Flink SQL framework, and apply accordingly. [1] https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#Commit_timestamp_tracking On Wed, Mar 11, 2020 at 6:20 AM Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > > I don't understand this discussion. Hints, as I understand them, should > work like this: > > - hints are *optional* advice for the optimizer to try and help it to > find a good execution strategy > - hints should not change query semantics, i.e. they should not change > connector properties executing a query with taking into account the > hints *must* produce the same result as executing the query without > taking into account the hints > > From these simple requirements you can derive a solution that makes > sense. I don't have a strong preference for the syntax but we should > strive to be in line with prior work. > > Best, > Aljoscha > > On 11.03.20 11:53, Danny Chan wrote: > > Thanks Timo for summarize the 3 options ~ > > > > I agree with Kurt that option2 is too complicated to use because: > > > > • As a Kafka topic consumer, the user must define both the virtual > column for start offset and he must apply a special filter predicate after > each query > > • And for the internal implementation, the metadata column push down is > another hard topic, each kind of message queue may have its offset > attribute, we need to consider the expression type for different kind; the > source also need to recognize the constant column as a config option(which > is weird because usually what we pushed down is a table column) > > > > For option 1 and option3, I think there is no difference, option1 is > also a hint syntax which is introduced in Sybase and referenced then > deprecated by MS-SQL in 199X years because of the ambitiousness. Personally > I prefer /*+ */ style table hint than WITH keyword for these reasons: > > > > • We do not break the standard SQL, the hints are nested in SQL comments > > • We do not need to introduce additional WITH keyword which may appear > in a query if we use that because a table can be referenced in all kinds of > SQL contexts: INSERT/DELETE/FROM/JOIN …. That would make our sql query > break too much of the SQL from standard > > • We would have uniform syntax for hints as query hint, one syntax fits > all and more easy to use > > > > > > And here is the reason why we choose a uniform Oracle style query > hint syntax which is addressed by Julian Hyde when we design the syntax > from the Calcite community: > > > > I don’t much like the MSSQL-style syntax for table hints. It adds a new > use of the WITH keyword that is unrelated to the use of WITH for > common-table expressions. > > > > A historical note. Microsoft SQL Server inherited its hint syntax from > Sybase a very long time ago. (See “Transact SQL Programming”[1], page 632, > “Optimizer hints”. The book was written in 1999, and covers Microsoft SQL > Server 6.5 / 7.0 and Sybase Adaptive Server 11.5, but the syntax very > likely predates Sybase 4.3, from which Microsoft SQL Server was forked in > 1993.) > > > > Microsoft later added the WITH keyword to make it less ambiguous, and > has now deprecated the syntax that does not use WITH. > > > > They are forced to keep the syntax for backwards compatibility but that > doesn’t mean that we should shoulder their burden. > > > > I think formatted comments are the right container for hints because it > allows us to change the hint syntax without changing the SQL parser, and > makes clear that we are at liberty to ignore hints entirely. > > > > Julian > > > > [1] https://www.amazon.com/s?k=9781565924017 < > https://www.amazon.com/s?k=9781565924017> > > > > Best, > > Danny Chan > > 在 2020年3月11日 +0800 PM4:03,Timo Walther <twal...@apache.org>,写道: > >> Hi Danny, > >> > >> it is true that our DDL is not standard compliant by using the WITH > >> clause. Nevertheless, we aim for not diverging too much and the LIKE > >> clause is an example of that. It will solve things like overwriting > >> WATERMARKs, add additional/modifying properties and inherit schema. > >> > >> Bowen is right that Flink's DDL is mixing 3 types definition together. > >> We are not the first ones that try to solve this. There is also the SQL > >> MED standard [1] that tried to tackle this problem. I think it was not > >> considered when designing the current DDL. > >> > >> Currently, I see 3 options for handling Kafka offsets. I will give some > >> examples and look forward to feedback here: > >> > >> *Option 1* Runtime and semantic parms as part of the query > >> > >> `SELECT * FROM MyTable('offset'=123)` > >> > >> Pros: > >> - Easy to add > >> - Parameters are part of the main query > >> - No complicated hinting syntax > >> > >> Cons: > >> - Not SQL compliant > >> > >> *Option 2* Use metadata in query > >> > >> `CREATE TABLE MyTable (id INT, offset AS SYSTEM_METADATA('offset'))` > >> > >> `SELECT * FROM MyTable WHERE offset > TIMESTAMP '2012-12-12 12:34:22'` > >> > >> Pros: > >> - SQL compliant in the query > >> - Access of metadata in the DDL which is required anyway > >> - Regular pushdown rules apply > >> > >> Cons: > >> - Users need to add an additional comlumn in the DDL > >> > >> *Option 3*: Use hints for properties > >> > >> ` > >> SELECT * > >> FROM MyTable /*+ PROPERTIES('offset'=123) */ > >> ` > >> > >> Pros: > >> - Easy to add > >> > >> Cons: > >> - Parameters are not part of the main query > >> - Cryptic syntax for new users > >> - Not standard compliant. > >> > >> If we go with this option, I would suggest to make it available in a > >> separate map and don't mix it with statically defined properties. Such > >> that the factory can decide which properties have the right to be > >> overwritten by the hints: > >> TableSourceFactory.Context.getQueryHints(): ReadableConfig > >> > >> Regards, > >> Timo > >> > >> [1] https://en.wikipedia.org/wiki/SQL/MED > >> > >> Currently I see 3 options as a > >> > >> > >> On 11.03.20 07:21, Danny Chan wrote: > >>> Thanks Bowen ~ > >>> > >>> I agree we should somehow categorize our connector parameters. > >>> > >>> For type1, I’m already preparing a solution like the Confluent schema > registry + Avro schema inference thing, so this may not be a problem in the > near future. > >>> > >>> For type3, I have some questions: > >>> > >>>> "SELECT * FROM mykafka WHERE offset > 12pm yesterday” > >>> > >>> Where does the offset column come from, a virtual column from the > table schema, you said that > >>> > >>>> They change > >>> almost every time a query starts and have nothing to do with metadata, > thus > >>> should not be part of table definition/DDL > >>> > >>> But why you can reference it in the query, I’m confused for that, can > you elaborate a little ? > >>> > >>> Best, > >>> Danny Chan > >>> 在 2020年3月11日 +0800 PM12:52,Bowen Li <bowenl...@gmail.com>,写道: > >>>> Thanks Danny for kicking off the effort > >>>> > >>>> The root cause of too much manual work is Flink DDL has mixed 3 types > of > >>>> params together and doesn't handle each of them very well. Below are > how I > >>>> categorize them and corresponding solutions in my mind: > >>>> > >>>> - type 1: Metadata of external data, like external endpoint/url, > >>>> username/pwd, schemas, formats. > >>>> > >>>> Such metadata are mostly already accessible in external system as > long as > >>>> endpoints and credentials are provided. Flink can get it thru > catalogs, but > >>>> we haven't had many catalogs yet and thus Flink just hasn't been able > to > >>>> leverage that. So the solution should be building more catalogs. Such > >>>> params should be part of a Flink table DDL/definition, and not > overridable > >>>> in any means. > >>>> > >>>> > >>>> - type 2: Runtime params, like jdbc connector's fetch size, > elasticsearch > >>>> connector's bulk flush size. > >>>> > >>>> Such params don't affect query results, but affect how results are > produced > >>>> (eg. fast or slow, aka performance) - they are essentially execution > and > >>>> implementation details. They change often in exploration or > development > >>>> stages, but not quite frequently in well-defined long-running > pipelines. > >>>> They should always have default values and can be missing in query. > They > >>>> can be part of a table DDL/definition, but should also be replaceable > in a > >>>> query - *this is what table "hints" in FLIP-113 should cover*. > >>>> > >>>> > >>>> - type 3: Semantic params, like kafka connector's start offset. > >>>> > >>>> Such params affect query results - the semantics. They'd better be as > >>>> filter conditions in WHERE clause that can be pushed down. They change > >>>> almost every time a query starts and have nothing to do with > metadata, thus > >>>> should not be part of table definition/DDL, nor be persisted in > catalogs. > >>>> If they will, users should create views to keep such params around > (note > >>>> this is different from variable substitution). > >>>> > >>>> > >>>> Take Flink-Kafka as an example. Once we get these params right, > here're the > >>>> steps users need to do to develop and run a Flink job: > >>>> - configure a Flink ConfluentSchemaRegistry with url, username, and > password > >>>> - run "SELECT * FROM mykafka WHERE offset > 12pm yesterday" > (simplified > >>>> timestamp) in SQL CLI, Flink automatically retrieves all metadata of > >>>> schema, file format, etc and start the job > >>>> - users want to make the job read Kafka topic faster, so it goes as > "SELECT > >>>> * FROM mykafka /* faster_read_key=value*/ WHERE offset > 12pm > yesterday" > >>>> - done and satisfied, users submit it to production > >>>> > >>>> > >>>> Regarding "CREATE TABLE t LIKE with (k1=v1, k2=v2), I think it's a > >>>> nice-to-have feature, but not a strategically critical, long-term > solution, > >>>> because > >>>> 1) It may seem promising at the current stage to solve the > >>>> too-much-manual-work problem, but that's only because Flink hasn't > >>>> leveraged catalogs well and handled the 3 types of params above > properly. > >>>> Once we get the params types right, the LIKE syntax won't be that > >>>> important, and will be just an easier way to create tables without > retyping > >>>> long fields like username and pwd. > >>>> 2) Note that only some rare type of catalog can store k-v property > pair, so > >>>> table created this way often cannot be persisted. In the foreseeable > >>>> future, such catalog will only be HiveCatalog, and not everyone has a > Hive > >>>> metastore. To be honest, without persistence, recreating tables every > time > >>>> this way is still a lot of keyboard typing. > >>>> > >>>> Cheers, > >>>> Bowen > >>>> > >>>> On Tue, Mar 10, 2020 at 8:07 PM Kurt Young <ykt...@gmail.com> wrote: > >>>> > >>>>> If a specific connector want to have such parameter and read if out > of > >>>>> configuration, then that's fine. > >>>>> If we are talking about a configuration for all kinds of sources, I > would > >>>>> be super careful about that. > >>>>> It's true it can solve maybe 80% cases, but it will also make the > left 20% > >>>>> feels weird. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Wed, Mar 11, 2020 at 11:00 AM Jark Wu <imj...@gmail.com> wrote: > >>>>> > >>>>>> Hi Kurt, > >>>>>> > >>>>>> #3 Regarding to global offset: > >>>>>> I'm not saying to use the global configuration to override connector > >>>>>> properties by the planner. > >>>>>> But the connector should take this configuration and translate into > their > >>>>>> client API. > >>>>>> AFAIK, almost all the message queues support eariliest and latest > and a > >>>>>> timestamp value as start point. > >>>>>> So we can support 3 options for this configuration: "eariliest", > "latest" > >>>>>> and a timestamp string value. > >>>>>> Of course, this can't solve 100% cases, but I guess can sovle 80% > or 90% > >>>>>> cases. > >>>>>> And the remaining cases can be resolved by LIKE syntax which I > guess is > >>>>> not > >>>>>> very common cases. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>> On Wed, 11 Mar 2020 at 10:33, Kurt Young <ykt...@gmail.com> wrote: > >>>>>> > >>>>>>> Good to have such lovely discussions. I also want to share some of > my > >>>>>>> opinions. > >>>>>>> > >>>>>>> #1 Regarding to error handling: I also think ignore invalid hints > would > >>>>>> be > >>>>>>> dangerous, maybe > >>>>>>> the simplest solution is just throw an exception. > >>>>>>> > >>>>>>> #2 Regarding to property replacement: I don't think we should > >>>>> constraint > >>>>>>> ourself to > >>>>>>> the meaning of the word "hint", and forbidden it modifying any > >>>>> properties > >>>>>>> which can effect > >>>>>>> query results. IMO `PROPERTIES` is one of the table hints, and a > >>>>> powerful > >>>>>>> one. It can > >>>>>>> modify properties located in DDL's WITH block. But I also see the > harm > >>>>>> that > >>>>>>> if we make it > >>>>>>> too flexible like change the kafka topic name with a hint. Such use > >>>>> case > >>>>>> is > >>>>>>> not common and > >>>>>>> sounds very dangerous to me. I would propose we have a map of > hintable > >>>>>>> properties for each > >>>>>>> connector, and should validate all passed in properties are > actually > >>>>>>> hintable. And combining with > >>>>>>> #1 error handling, we can throw an exception once received invalid > >>>>>>> property. > >>>>>>> > >>>>>>> #3 Regarding to global offset: I'm not sure it's feasible. > Different > >>>>>>> connectors will have totally > >>>>>>> different properties to represent offset, some might be timestamps, > >>>>> some > >>>>>>> might be string literals > >>>>>>> like "earliest", and others might be just integers. > >>>>>>> > >>>>>>> Best, > >>>>>>> Kurt > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <imj...@gmail.com> wrote: > >>>>>>> > >>>>>>>> Hi everyone, > >>>>>>>> > >>>>>>>> I want to jump in the discussion about the "dynamic start offset" > >>>>>>> problem. > >>>>>>>> First of all, I share the same concern with Timo and Fabian, that > the > >>>>>>>> "start offset" affects the query semantics, i.e. the query result. > >>>>>>>> But "hints" is just used for optimization which should affect the > >>>>>> result? > >>>>>>>> > >>>>>>>> I think the "dynamic start offset" is an very important usability > >>>>>> problem > >>>>>>>> which will be faced by many streaming platforms. > >>>>>>>> I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH > >>>>>>>> ('connector.startup-timestamp-millis' = '1578538374471')" is > verbose, > >>>>>>> what > >>>>>>>> if we have 10 tables to join? > >>>>>>>> > >>>>>>>> However, what I want to propose (should be another thread) is a > >>>>> global > >>>>>>>> configuration to reset start offsets of all the source connectors > >>>>>>>> in the query session, e.g. "table.sources.start-offset". This is > >>>>>> possible > >>>>>>>> now because `TableSourceFactory.Context` has `getConfiguration` > >>>>>>>> method to get the session configuration, and use it to create an > >>>>>> adapted > >>>>>>>> TableSource. > >>>>>>>> Then we can also expose to SQL CLI via SET command, e.g. `SET > >>>>>>>> 'table.sources.start-offset'='earliest';`, which is pretty simple > and > >>>>>>>> straightforward. > >>>>>>>> > >>>>>>>> This is very similar to KSQL's `SET > 'auto.offset.reset'='earliest'` > >>>>>> which > >>>>>>>> is very helpful IMO. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jark > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, 10 Mar 2020 at 22:29, Timo Walther <twal...@apache.org> > >>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi Danny, > >>>>>>>>> > >>>>>>>>> compared to the hints, FLIP-110 is fully compliant to the SQL > >>>>>> standard. > >>>>>>>>> > >>>>>>>>> I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH > >>>>> (k=v)` > >>>>>> is > >>>>>>>>> too verbose or awkward for the power of basically changing the > >>>>> entire > >>>>>>>>> connector. Usually, this statement would just precede the query > in > >>>>> a > >>>>>>>>> multiline file. So it can be change "in-place" like the hints you > >>>>>>>> proposed. > >>>>>>>>> > >>>>>>>>> Many companies have a well-defined set of tables that should be > >>>>> used. > >>>>>>> It > >>>>>>>>> would be dangerous if users can change the path or topic in a > hint. > >>>>>> The > >>>>>>>>> catalog/catalog manager should be the entity that controls which > >>>>>> tables > >>>>>>>>> exist and how they can be accessed. > >>>>>>>>> > >>>>>>>>>> what’s the problem there if we user the table hints to support > >>>>>>> “start > >>>>>>>>> offset”? > >>>>>>>>> > >>>>>>>>> IMHO it violates the meaning of a hint. According to the > >>>>> dictionary, > >>>>>> a > >>>>>>>>> hint is "a statement that expresses indirectly what one prefers > not > >>>>>> to > >>>>>>>>> say explicitly". But offsets are a property that are very > explicit. > >>>>>>>>> > >>>>>>>>> If we go with the hint approach, it should be expressible in the > >>>>>>>>> TableSourceFactory which properties are supported for hinting. Or > >>>>> do > >>>>>>> you > >>>>>>>>> plan to offer those hints in a separate Map<String, String> that > >>>>>> cannot > >>>>>>>>> overwrite existing properties? I think this would be a different > >>>>>>> story... > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 10.03.20 10:34, Danny Chan wrote: > >>>>>>>>>> Thanks Timo ~ > >>>>>>>>>> > >>>>>>>>>> Personally I would say that offset > 0 and start offset = 10 > does > >>>>>> not > >>>>>>>>> have the same semantic, so from the SQL aspect, we can not > >>>>> implement > >>>>>> a > >>>>>>>>> “starting offset” hint for query with such a syntax. > >>>>>>>>>> > >>>>>>>>>> And the CREATE TABLE LIKE syntax is a DDL which is just verbose > >>>>> for > >>>>>>>>> defining such dynamic parameters even if it could do that, shall > we > >>>>>>> force > >>>>>>>>> users to define a temporal table for each query with dynamic > >>>>> params, > >>>>>> I > >>>>>>>>> would say it’s an awkward solution. > >>>>>>>>>> > >>>>>>>>>> "Hints should give "hints" but not affect the actual produced > >>>>>>> result.” > >>>>>>>>> You mentioned that multiple times and could we give a reason, > >>>>> what’s > >>>>>>> the > >>>>>>>>> problem there if we user the table hints to support “start > offset” > >>>>> ? > >>>>>>> From > >>>>>>>>> my side I saw some benefits for that: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> • It’s very convent to set up these parameters, the syntax is > >>>>> very > >>>>>>> much > >>>>>>>>> like the DDL definition > >>>>>>>>>> • It’s scope is very clear, right on the table it attathed > >>>>>>>>>> • It does not affect the table schema, which means in order to > >>>>>>> specify > >>>>>>>>> the offset, there is no need to define an offset column which is > >>>>>> weird > >>>>>>>>> actually, offset should never be a column, it’s more like a > >>>>> metadata > >>>>>>> or a > >>>>>>>>> start option. > >>>>>>>>>> > >>>>>>>>>> So in total, FLIP-110 uses the offset more like a Hive partition > >>>>>>> prune, > >>>>>>>>> we can do that if we have an offset column, but most of the case > we > >>>>>> do > >>>>>>>> not > >>>>>>>>> define that, so there is actually no conflict or overlap. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Danny Chan > >>>>>>>>>> 在 2020年3月10日 +0800 PM4:28,Timo Walther <twal...@apache.org>,写道: > >>>>>>>>>>> Hi Danny, > >>>>>>>>>>> > >>>>>>>>>>> shouldn't FLIP-110[1] solve most of the problems we have around > >>>>>>>> defining > >>>>>>>>>>> table properties more dynamically without manual schema work? > >>>>> Also > >>>>>>>>>>> offset definition is easier with such a syntax. They must not > be > >>>>>>>> defined > >>>>>>>>>>> in catalog but could be temporary tables that extend from the > >>>>>>> original > >>>>>>>>>>> table. > >>>>>>>>>>> > >>>>>>>>>>> In general, we should aim to keep the syntax concise and don't > >>>>>>> provide > >>>>>>>>>>> too many ways of doing the same thing. Hints should give > "hints" > >>>>>> but > >>>>>>>> not > >>>>>>>>>>> affect the actual produced result. > >>>>>>>>>>> > >>>>>>>>>>> Some connector properties might also change the plan or schema > >>>>> in > >>>>>>> the > >>>>>>>>>>> future. E.g. they might also define whether a table source > >>>>>> supports > >>>>>>>>>>> certain push-downs (e.g. predicate push-down). > >>>>>>>>>>> > >>>>>>>>>>> Dawid is currently working a draft that might makes it possible > >>>>> to > >>>>>>>>>>> expose a Kafka offset via the schema such that `SELECT * FROM > >>>>>> Topic > >>>>>>>>>>> WHERE offset > 10` would become possible and could be pushed > >>>>> down. > >>>>>>> But > >>>>>>>>>>> this is of course, not planned initially. > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 10.03.20 08:34, Danny Chan wrote: > >>>>>>>>>>>> Thanks Wenlong ~ > >>>>>>>>>>>> > >>>>>>>>>>>> For PROPERTIES Hint Error handling > >>>>>>>>>>>> > >>>>>>>>>>>> Actually we have no way to figure out whether a error prone > >>>>> hint > >>>>>>> is a > >>>>>>>>> PROPERTIES hint, for example, if use writes a hint like > >>>>> ‘PROPERTIAS’, > >>>>>>> we > >>>>>>>> do > >>>>>>>>> not know if this hint is a PROPERTIES hint, what we know is that > >>>>> the > >>>>>>> hint > >>>>>>>>> name was not registered in our Flink. > >>>>>>>>>>>> > >>>>>>>>>>>> If the user writes the hint name correctly (i.e. PROPERTIES), > >>>>> we > >>>>>>> did > >>>>>>>>> can enforce the validation of the hint options though the > pluggable > >>>>>>>>> HintOptionChecker. > >>>>>>>>>>>> > >>>>>>>>>>>> For PROPERTIES Hint Option Format > >>>>>>>>>>>> > >>>>>>>>>>>> For a key value style hint option, the key can be either a > >>>>> simple > >>>>>>>>> identifier or a string literal, which means that it’s compatible > >>>>> with > >>>>>>> our > >>>>>>>>> DDL syntax. We support simple identifier because many other hints > >>>>> do > >>>>>>> not > >>>>>>>>> have the component complex keys like the table properties, and we > >>>>>> want > >>>>>>> to > >>>>>>>>> unify the parse block. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Danny Chan > >>>>>>>>>>>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl < > wenlong88....@gmail.com > >>>>>>>> ,写道: > >>>>>>>>>>>>> Hi Danny, thanks for the proposal. +1 for adding table hints, > >>>>> it > >>>>>>> is > >>>>>>>>> really > >>>>>>>>>>>>> a necessary feature for flink sql to integrate with a > catalog. > >>>>>>>>>>>>> > >>>>>>>>>>>>> For error handling, I think it would be more natural to throw > >>>>> an > >>>>>>>>>>>>> exception when error table hint provided, because the > >>>>> properties > >>>>>>> in > >>>>>>>>> hint > >>>>>>>>>>>>> will be merged and used to find the table factory which would > >>>>>>> cause > >>>>>>>> an > >>>>>>>>>>>>> exception when error properties provided, right? On the other > >>>>>>> hand, > >>>>>>>>> unlike > >>>>>>>>>>>>> other hints which just affect the way to execute the query, > >>>>> the > >>>>>>>>> property > >>>>>>>>>>>>> table hint actually affects the result of the query, we > should > >>>>>>> never > >>>>>>>>> ignore > >>>>>>>>>>>>> the given property hints. > >>>>>>>>>>>>> > >>>>>>>>>>>>> For the format of property hints, currently, in sql client, > we > >>>>>>>> accept > >>>>>>>>>>>>> properties in format of string only in DDL: > >>>>>>>> 'connector.type'='kafka', > >>>>>>>>> I > >>>>>>>>>>>>> think the format of properties in hint should be the same as > >>>>> the > >>>>>>>>> format we > >>>>>>>>>>>>> defined in ddl. What do you think? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Bests, > >>>>>>>>>>>>> Wenlong Lyu > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan < > >>>>> yuzhao....@gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> To Weike: About the Error Handing > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> To be consistent with other SQL vendors, the default is to > >>>>> log > >>>>>>>>> warnings > >>>>>>>>>>>>>> and if there is any error (invalid hint name or options), > the > >>>>>>> hint > >>>>>>>>> is just > >>>>>>>>>>>>>> ignored. I have already addressed in the wiki. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> To Timo: About the PROPERTIES Table Hint > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> • The properties hints is also optional, user can pass in an > >>>>>>> option > >>>>>>>>> to > >>>>>>>>>>>>>> override the table properties but this does not mean it is > >>>>>>>> required. > >>>>>>>>>>>>>> • They should not include semantics: does the properties > >>>>> belong > >>>>>>> to > >>>>>>>>>>>>>> semantic ? I don't think so, the plan does not change right > ? > >>>>>> The > >>>>>>>>> result > >>>>>>>>>>>>>> set may be affected, but there are already some hints do so, > >>>>>> for > >>>>>>>>> example, > >>>>>>>>>>>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1] > >>>>>>>>>>>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL > >>>>>>> standard > >>>>>>>>>>>>>> compared to the hints way(which is included in comments) > >>>>>>>>>>>>>> • I actually didn't found any vendors to support such > >>>>> grammar, > >>>>>>> and > >>>>>>>>> there > >>>>>>>>>>>>>> is no way to override table level properties dynamically. > For > >>>>>>>> normal > >>>>>>>>> RDBMS, > >>>>>>>>>>>>>> I think there are no requests for such dynamic parameters > >>>>>> because > >>>>>>>>> all the > >>>>>>>>>>>>>> table have the same storage and computation and they are > >>>>> almost > >>>>>>> all > >>>>>>>>> batch > >>>>>>>>>>>>>> tables. > >>>>>>>>>>>>>> • While Flink as a computation engine has many connectors, > >>>>>>>>> especially for > >>>>>>>>>>>>>> some message queue like Kafka, we would have a start_offset > >>>>>> which > >>>>>>>> is > >>>>>>>>>>>>>> different each time we start the query, such parameters can > >>>>> not > >>>>>>> be > >>>>>>>>>>>>>> persisted to catalog, because it’s not static, this is > >>>>> actually > >>>>>>> the > >>>>>>>>>>>>>> background we propose the table hints to indicate such > >>>>>> properties > >>>>>>>>>>>>>> dynamically. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> To Jark and Jinsong: I have removed the query hints part and > >>>>>>> change > >>>>>>>>> the > >>>>>>>>>>>>>> title. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <twal...@apache.org > >>>>>> ,写道: > >>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> thanks for the proposal. I agree with Jark and Jingsong. > >>>>>> Planner > >>>>>>>>> hints > >>>>>>>>>>>>>>> and table hints are orthogonal topics that should be > >>>>> discussed > >>>>>>>>>>>>>> separately. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I share Jingsong's opinion that we should not use planner > >>>>>> hints > >>>>>>>> for > >>>>>>>>>>>>>>> passing connector properties. Planner hints should be > >>>>> optional > >>>>>>> at > >>>>>>>>> any > >>>>>>>>>>>>>>> time. They should not include semantics but only affect > >>>>>>> execution > >>>>>>>>> time. > >>>>>>>>>>>>>>> Connector properties are an important part of the query > >>>>>> itself. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Have you thought about options such as `SELECT * FROM > t(k=v, > >>>>>>>> k=v)`? > >>>>>>>>> How > >>>>>>>>>>>>>>> are other vendors deal with this problem? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 09.03.20 10:37, Jingsong Li wrote: > >>>>>>>>>>>>>>>> Hi Danny, +1 for table hints, thanks for driving. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I took a look to FLIP, most of content are talking about > >>>>>> query > >>>>>>>>> hints. > >>>>>>>>>>>>>> It is > >>>>>>>>>>>>>>>> hard to discussion and voting. So +1 to split it as Jark > >>>>>> said. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Another thing is configuration that suitable to config > with > >>>>>>> table > >>>>>>>>>>>>>> hints: > >>>>>>>>>>>>>>>> "connector.path" and "connector.topic", Are they really > >>>>>>> suitable > >>>>>>>>> for > >>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>> hints? Looks weird to me. Because I think these properties > >>>>>> are > >>>>>>>> the > >>>>>>>>>>>>>> core of > >>>>>>>>>>>>>>>> table. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <imj...@gmail.com> > >>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks Danny for starting the discussion. > >>>>>>>>>>>>>>>>> +1 for this feature. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If we just focus on the table hints not the query hints > in > >>>>>>> this > >>>>>>>>>>>>>> release, > >>>>>>>>>>>>>>>>> could you split the FLIP into two FLIPs? > >>>>>>>>>>>>>>>>> Because it's hard to vote on partial part of a FLIP. You > >>>>> can > >>>>>>>> keep > >>>>>>>>>>>>>> the table > >>>>>>>>>>>>>>>>> hints proposal in FLIP-113 and move query hints into > >>>>> another > >>>>>>>> FLIP. > >>>>>>>>>>>>>>>>> So that we can focuse on the table hints in the FLIP. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike < > >>>>>>>> kyled...@connect.hku.hk > >>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> This is a nice feature, +1. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> One thing I am interested in but not mentioned in the > >>>>>>> proposal > >>>>>>>> is > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> error > >>>>>>>>>>>>>>>>>> handling, as it is quite common for users to write > >>>>>>>> inappropriate > >>>>>>>>>>>>>> hints in > >>>>>>>>>>>>>>>>>> SQL code, if illegal or "bad" hints are given, would the > >>>>>>> system > >>>>>>>>>>>>>> simply > >>>>>>>>>>>>>>>>>> ignore them or throw exceptions? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks : ) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> Weike > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan < > >>>>>>>> yuzhao....@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Note: > >>>>>>>>>>>>>>>>>>> we only plan to support table hints in Flink release > >>>>> 1.11, > >>>>>>> so > >>>>>>>>>>>>>> please > >>>>>>>>>>>>>>>>>> focus > >>>>>>>>>>>>>>>>>>> mainly on the table hints part and just ignore the > >>>>> planner > >>>>>>>>>>>>>> hints, sorry > >>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>> that mistake ~ > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan < > >>>>> yuzhao....@gmail.com > >>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>> Hi, fellows ~ > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I would like to propose the supports for SQL hints for > >>>>>> our > >>>>>>>>>>>>>> Flink SQL. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> We would support hints syntax as following: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', > >>>>>>>>>>>>>> parallelism='24') */ > >>>>>>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */ > >>>>>>>>>>>>>>>>>>>> join > >>>>>>>>>>>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */ > >>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>> emp.deptno = dept.deptno > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Basically we would support both query hints(after the > >>>>>>> SELECT > >>>>>>>>>>>>>> keyword) > >>>>>>>>>>>>>>>>>>> and table hints(after the referenced table name), for > >>>>>> 1.11, > >>>>>>> we > >>>>>>>>>>>>>> plan to > >>>>>>>>>>>>>>>>>> only > >>>>>>>>>>>>>>>>>>> support table hints with a hint probably named > >>>>> PROPERTIES: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/ > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I am looking forward to your comments. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> You can access the FLIP here: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >> > > > >