If a specific connector want to have such parameter and read if out of configuration, then that's fine. If we are talking about a configuration for all kinds of sources, I would be super careful about that. It's true it can solve maybe 80% cases, but it will also make the left 20% feels weird.
Best, Kurt On Wed, Mar 11, 2020 at 11:00 AM Jark Wu <imj...@gmail.com> wrote: > Hi Kurt, > > #3 Regarding to global offset: > I'm not saying to use the global configuration to override connector > properties by the planner. > But the connector should take this configuration and translate into their > client API. > AFAIK, almost all the message queues support eariliest and latest and a > timestamp value as start point. > So we can support 3 options for this configuration: "eariliest", "latest" > and a timestamp string value. > Of course, this can't solve 100% cases, but I guess can sovle 80% or 90% > cases. > And the remaining cases can be resolved by LIKE syntax which I guess is not > very common cases. > > Best, > Jark > > > On Wed, 11 Mar 2020 at 10:33, Kurt Young <ykt...@gmail.com> wrote: > > > Good to have such lovely discussions. I also want to share some of my > > opinions. > > > > #1 Regarding to error handling: I also think ignore invalid hints would > be > > dangerous, maybe > > the simplest solution is just throw an exception. > > > > #2 Regarding to property replacement: I don't think we should constraint > > ourself to > > the meaning of the word "hint", and forbidden it modifying any properties > > which can effect > > query results. IMO `PROPERTIES` is one of the table hints, and a powerful > > one. It can > > modify properties located in DDL's WITH block. But I also see the harm > that > > if we make it > > too flexible like change the kafka topic name with a hint. Such use case > is > > not common and > > sounds very dangerous to me. I would propose we have a map of hintable > > properties for each > > connector, and should validate all passed in properties are actually > > hintable. And combining with > > #1 error handling, we can throw an exception once received invalid > > property. > > > > #3 Regarding to global offset: I'm not sure it's feasible. Different > > connectors will have totally > > different properties to represent offset, some might be timestamps, some > > might be string literals > > like "earliest", and others might be just integers. > > > > Best, > > Kurt > > > > > > On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <imj...@gmail.com> wrote: > > > > > Hi everyone, > > > > > > I want to jump in the discussion about the "dynamic start offset" > > problem. > > > First of all, I share the same concern with Timo and Fabian, that the > > > "start offset" affects the query semantics, i.e. the query result. > > > But "hints" is just used for optimization which should affect the > result? > > > > > > I think the "dynamic start offset" is an very important usability > problem > > > which will be faced by many streaming platforms. > > > I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH > > > ('connector.startup-timestamp-millis' = '1578538374471')" is verbose, > > what > > > if we have 10 tables to join? > > > > > > However, what I want to propose (should be another thread) is a global > > > configuration to reset start offsets of all the source connectors > > > in the query session, e.g. "table.sources.start-offset". This is > possible > > > now because `TableSourceFactory.Context` has `getConfiguration` > > > method to get the session configuration, and use it to create an > adapted > > > TableSource. > > > Then we can also expose to SQL CLI via SET command, e.g. `SET > > > 'table.sources.start-offset'='earliest';`, which is pretty simple and > > > straightforward. > > > > > > This is very similar to KSQL's `SET 'auto.offset.reset'='earliest'` > which > > > is very helpful IMO. > > > > > > Best, > > > Jark > > > > > > > > > On Tue, 10 Mar 2020 at 22:29, Timo Walther <twal...@apache.org> wrote: > > > > > > > Hi Danny, > > > > > > > > compared to the hints, FLIP-110 is fully compliant to the SQL > standard. > > > > > > > > I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH (k=v)` > is > > > > too verbose or awkward for the power of basically changing the entire > > > > connector. Usually, this statement would just precede the query in a > > > > multiline file. So it can be change "in-place" like the hints you > > > proposed. > > > > > > > > Many companies have a well-defined set of tables that should be used. > > It > > > > would be dangerous if users can change the path or topic in a hint. > The > > > > catalog/catalog manager should be the entity that controls which > tables > > > > exist and how they can be accessed. > > > > > > > > > what’s the problem there if we user the table hints to support > > “start > > > > offset”? > > > > > > > > IMHO it violates the meaning of a hint. According to the dictionary, > a > > > > hint is "a statement that expresses indirectly what one prefers not > to > > > > say explicitly". But offsets are a property that are very explicit. > > > > > > > > If we go with the hint approach, it should be expressible in the > > > > TableSourceFactory which properties are supported for hinting. Or do > > you > > > > plan to offer those hints in a separate Map<String, String> that > cannot > > > > overwrite existing properties? I think this would be a different > > story... > > > > > > > > Regards, > > > > Timo > > > > > > > > > > > > On 10.03.20 10:34, Danny Chan wrote: > > > > > Thanks Timo ~ > > > > > > > > > > Personally I would say that offset > 0 and start offset = 10 does > not > > > > have the same semantic, so from the SQL aspect, we can not implement > a > > > > “starting offset” hint for query with such a syntax. > > > > > > > > > > And the CREATE TABLE LIKE syntax is a DDL which is just verbose for > > > > defining such dynamic parameters even if it could do that, shall we > > force > > > > users to define a temporal table for each query with dynamic params, > I > > > > would say it’s an awkward solution. > > > > > > > > > > "Hints should give "hints" but not affect the actual produced > > result.” > > > > You mentioned that multiple times and could we give a reason, what’s > > the > > > > problem there if we user the table hints to support “start offset” ? > > From > > > > my side I saw some benefits for that: > > > > > > > > > > > > > > > • It’s very convent to set up these parameters, the syntax is very > > much > > > > like the DDL definition > > > > > • It’s scope is very clear, right on the table it attathed > > > > > • It does not affect the table schema, which means in order to > > specify > > > > the offset, there is no need to define an offset column which is > weird > > > > actually, offset should never be a column, it’s more like a metadata > > or a > > > > start option. > > > > > > > > > > So in total, FLIP-110 uses the offset more like a Hive partition > > prune, > > > > we can do that if we have an offset column, but most of the case we > do > > > not > > > > define that, so there is actually no conflict or overlap. > > > > > > > > > > Best, > > > > > Danny Chan > > > > > 在 2020年3月10日 +0800 PM4:28,Timo Walther <twal...@apache.org>,写道: > > > > >> Hi Danny, > > > > >> > > > > >> shouldn't FLIP-110[1] solve most of the problems we have around > > > defining > > > > >> table properties more dynamically without manual schema work? Also > > > > >> offset definition is easier with such a syntax. They must not be > > > defined > > > > >> in catalog but could be temporary tables that extend from the > > original > > > > >> table. > > > > >> > > > > >> In general, we should aim to keep the syntax concise and don't > > provide > > > > >> too many ways of doing the same thing. Hints should give "hints" > but > > > not > > > > >> affect the actual produced result. > > > > >> > > > > >> Some connector properties might also change the plan or schema in > > the > > > > >> future. E.g. they might also define whether a table source > supports > > > > >> certain push-downs (e.g. predicate push-down). > > > > >> > > > > >> Dawid is currently working a draft that might makes it possible to > > > > >> expose a Kafka offset via the schema such that `SELECT * FROM > Topic > > > > >> WHERE offset > 10` would become possible and could be pushed down. > > But > > > > >> this is of course, not planned initially. > > > > >> > > > > >> Regards, > > > > >> Timo > > > > >> > > > > >> > > > > >> [1] > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE > > > > >> > > > > >> > > > > >> > > > > >> On 10.03.20 08:34, Danny Chan wrote: > > > > >>> Thanks Wenlong ~ > > > > >>> > > > > >>> For PROPERTIES Hint Error handling > > > > >>> > > > > >>> Actually we have no way to figure out whether a error prone hint > > is a > > > > PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, > > we > > > do > > > > not know if this hint is a PROPERTIES hint, what we know is that the > > hint > > > > name was not registered in our Flink. > > > > >>> > > > > >>> If the user writes the hint name correctly (i.e. PROPERTIES), we > > did > > > > can enforce the validation of the hint options though the pluggable > > > > HintOptionChecker. > > > > >>> > > > > >>> For PROPERTIES Hint Option Format > > > > >>> > > > > >>> For a key value style hint option, the key can be either a simple > > > > identifier or a string literal, which means that it’s compatible with > > our > > > > DDL syntax. We support simple identifier because many other hints do > > not > > > > have the component complex keys like the table properties, and we > want > > to > > > > unify the parse block. > > > > >>> > > > > >>> Best, > > > > >>> Danny Chan > > > > >>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <wenlong88....@gmail.com > > >,写道: > > > > >>>> Hi Danny, thanks for the proposal. +1 for adding table hints, it > > is > > > > really > > > > >>>> a necessary feature for flink sql to integrate with a catalog. > > > > >>>> > > > > >>>> For error handling, I think it would be more natural to throw an > > > > >>>> exception when error table hint provided, because the properties > > in > > > > hint > > > > >>>> will be merged and used to find the table factory which would > > cause > > > an > > > > >>>> exception when error properties provided, right? On the other > > hand, > > > > unlike > > > > >>>> other hints which just affect the way to execute the query, the > > > > property > > > > >>>> table hint actually affects the result of the query, we should > > never > > > > ignore > > > > >>>> the given property hints. > > > > >>>> > > > > >>>> For the format of property hints, currently, in sql client, we > > > accept > > > > >>>> properties in format of string only in DDL: > > > 'connector.type'='kafka', > > > > I > > > > >>>> think the format of properties in hint should be the same as the > > > > format we > > > > >>>> defined in ddl. What do you think? > > > > >>>> > > > > >>>> Bests, > > > > >>>> Wenlong Lyu > > > > >>>> > > > > >>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <yuzhao....@gmail.com> > > > > wrote: > > > > >>>> > > > > >>>>> To Weike: About the Error Handing > > > > >>>>> > > > > >>>>> To be consistent with other SQL vendors, the default is to log > > > > warnings > > > > >>>>> and if there is any error (invalid hint name or options), the > > hint > > > > is just > > > > >>>>> ignored. I have already addressed in the wiki. > > > > >>>>> > > > > >>>>> To Timo: About the PROPERTIES Table Hint > > > > >>>>> > > > > >>>>> • The properties hints is also optional, user can pass in an > > option > > > > to > > > > >>>>> override the table properties but this does not mean it is > > > required. > > > > >>>>> • They should not include semantics: does the properties belong > > to > > > > >>>>> semantic ? I don't think so, the plan does not change right ? > The > > > > result > > > > >>>>> set may be affected, but there are already some hints do so, > for > > > > example, > > > > >>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1] > > > > >>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL > > standard > > > > >>>>> compared to the hints way(which is included in comments) > > > > >>>>> • I actually didn't found any vendors to support such grammar, > > and > > > > there > > > > >>>>> is no way to override table level properties dynamically. For > > > normal > > > > RDBMS, > > > > >>>>> I think there are no requests for such dynamic parameters > because > > > > all the > > > > >>>>> table have the same storage and computation and they are almost > > all > > > > batch > > > > >>>>> tables. > > > > >>>>> • While Flink as a computation engine has many connectors, > > > > especially for > > > > >>>>> some message queue like Kafka, we would have a start_offset > which > > > is > > > > >>>>> different each time we start the query, such parameters can not > > be > > > > >>>>> persisted to catalog, because it’s not static, this is actually > > the > > > > >>>>> background we propose the table hints to indicate such > properties > > > > >>>>> dynamically. > > > > >>>>> > > > > >>>>> > > > > >>>>> To Jark and Jinsong: I have removed the query hints part and > > change > > > > the > > > > >>>>> title. > > > > >>>>> > > > > >>>>> [1] > > > > >>>>> > > > > > > > > > > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15 > > > > >>>>> > > > > >>>>> Best, > > > > >>>>> Danny Chan > > > > >>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <twal...@apache.org>,写道: > > > > >>>>>> Hi Danny, > > > > >>>>>> > > > > >>>>>> thanks for the proposal. I agree with Jark and Jingsong. > Planner > > > > hints > > > > >>>>>> and table hints are orthogonal topics that should be discussed > > > > >>>>> separately. > > > > >>>>>> > > > > >>>>>> I share Jingsong's opinion that we should not use planner > hints > > > for > > > > >>>>>> passing connector properties. Planner hints should be optional > > at > > > > any > > > > >>>>>> time. They should not include semantics but only affect > > execution > > > > time. > > > > >>>>>> Connector properties are an important part of the query > itself. > > > > >>>>>> > > > > >>>>>> Have you thought about options such as `SELECT * FROM t(k=v, > > > k=v)`? > > > > How > > > > >>>>>> are other vendors deal with this problem? > > > > >>>>>> > > > > >>>>>> Regards, > > > > >>>>>> Timo > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> On 09.03.20 10:37, Jingsong Li wrote: > > > > >>>>>>> Hi Danny, +1 for table hints, thanks for driving. > > > > >>>>>>> > > > > >>>>>>> I took a look to FLIP, most of content are talking about > query > > > > hints. > > > > >>>>> It is > > > > >>>>>>> hard to discussion and voting. So +1 to split it as Jark > said. > > > > >>>>>>> > > > > >>>>>>> Another thing is configuration that suitable to config with > > table > > > > >>>>> hints: > > > > >>>>>>> "connector.path" and "connector.topic", Are they really > > suitable > > > > for > > > > >>>>> table > > > > >>>>>>> hints? Looks weird to me. Because I think these properties > are > > > the > > > > >>>>> core of > > > > >>>>>>> table. > > > > >>>>>>> > > > > >>>>>>> Best, > > > > >>>>>>> Jingsong Lee > > > > >>>>>>> > > > > >>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <imj...@gmail.com> > > wrote: > > > > >>>>>>> > > > > >>>>>>>> Thanks Danny for starting the discussion. > > > > >>>>>>>> +1 for this feature. > > > > >>>>>>>> > > > > >>>>>>>> If we just focus on the table hints not the query hints in > > this > > > > >>>>> release, > > > > >>>>>>>> could you split the FLIP into two FLIPs? > > > > >>>>>>>> Because it's hard to vote on partial part of a FLIP. You can > > > keep > > > > >>>>> the table > > > > >>>>>>>> hints proposal in FLIP-113 and move query hints into another > > > FLIP. > > > > >>>>>>>> So that we can focuse on the table hints in the FLIP. > > > > >>>>>>>> > > > > >>>>>>>> Thanks, > > > > >>>>>>>> Jark > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike < > > > kyled...@connect.hku.hk > > > > > > > > > >>>>> wrote: > > > > >>>>>>>> > > > > >>>>>>>>> Hi Danny, > > > > >>>>>>>>> > > > > >>>>>>>>> This is a nice feature, +1. > > > > >>>>>>>>> > > > > >>>>>>>>> One thing I am interested in but not mentioned in the > > proposal > > > is > > > > >>>>> the > > > > >>>>>>>> error > > > > >>>>>>>>> handling, as it is quite common for users to write > > > inappropriate > > > > >>>>> hints in > > > > >>>>>>>>> SQL code, if illegal or "bad" hints are given, would the > > system > > > > >>>>> simply > > > > >>>>>>>>> ignore them or throw exceptions? > > > > >>>>>>>>> > > > > >>>>>>>>> Thanks : ) > > > > >>>>>>>>> > > > > >>>>>>>>> Best, > > > > >>>>>>>>> Weike > > > > >>>>>>>>> > > > > >>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan < > > > yuzhao....@gmail.com> > > > > >>>>> wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> Note: > > > > >>>>>>>>>> we only plan to support table hints in Flink release 1.11, > > so > > > > >>>>> please > > > > >>>>>>>>> focus > > > > >>>>>>>>>> mainly on the table hints part and just ignore the planner > > > > >>>>> hints, sorry > > > > >>>>>>>>> for > > > > >>>>>>>>>> that mistake ~ > > > > >>>>>>>>>> > > > > >>>>>>>>>> Best, > > > > >>>>>>>>>> Danny Chan > > > > >>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <yuzhao....@gmail.com > > > >,写道: > > > > >>>>>>>>>>> Hi, fellows ~ > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> I would like to propose the supports for SQL hints for > our > > > > >>>>> Flink SQL. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> We would support hints syntax as following: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', > > > > >>>>> parallelism='24') */ > > > > >>>>>>>>>>> from > > > > >>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */ > > > > >>>>>>>>>>> join > > > > >>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */ > > > > >>>>>>>>>>> on > > > > >>>>>>>>>>> emp.deptno = dept.deptno > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Basically we would support both query hints(after the > > SELECT > > > > >>>>> keyword) > > > > >>>>>>>>>> and table hints(after the referenced table name), for > 1.11, > > we > > > > >>>>> plan to > > > > >>>>>>>>> only > > > > >>>>>>>>>> support table hints with a hint probably named PROPERTIES: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/ > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> I am looking forward to your comments. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> You can access the FLIP here: > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Best, > > > > >>>>>>>>>>> Danny Chan > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > >> > > > > > > > > > > > > > > > > > > >