Hi everyone, When reviewing the name of the hint option 'miss-retry'='true|false', I feel the current name is not precise enough, it might be easier to understand by using the retry-predicate directly from flip-232, e.g. 'retry-predicate'='lookup-miss', which has the additional benefit of extensibility(maybe more retry condition in the future).
Jark & Jingsong, do you have any suggestions? If we agree with the name 'retry-predicate' or other better candidate, I'll update the FLIP. Best, Lincoln Lee Lincoln Lee <lincoln.8...@gmail.com> 于2022年6月2日周四 11:23写道: > Hi everyone, > > I've updated the FLIP[1] based on this discussion thread that we agree to > have a single unified 'LOOKUP' hint and also a related part in FLIP-221[2] > which is mainly for the necessity of the common table option > 'lookup.async'. > > The main updates are: > 1. the new unified 'LOOKUP' hint, make retry support both on sync and > async lookup > 2. clarify the default choice of the planner for those connectors which > have both sync and async lookup capabilities, and how to deal with the > query hint > 3. will add a followup issue to discuss whether to remove the > 'lookup.async' option in HBase connector. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems > [2] https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq > > Best, > Lincoln Lee > > > Lincoln Lee <lincoln.8...@gmail.com> 于2022年6月1日周三 16:03写道: > >> Hi Jingsong, >> >> There will be no change for connectors with only one capability (sync or >> async). >> >> Query hint works in a best effort manner, so if users specifies a hint >> with invalid option, the query plan keeps unchanged, e.g., use >> LOOKUP('table'='customer', 'async'='true'), but backend lookup source only >> implemented the sync lookup function, then the async lookup hint takes no >> effect. >> >> For these connectors which can have both capabilities of async and sync >> lookup, our advice for the connector developer is implementing both sync >> and async interfaces if both capabilities have suitable use cases, and the >> planner can decide which capability is the preferable one based on cost >> model or maybe other mechanism (another use case is exactly what we're >> discussing here, users can give the query hint), otherwise choose one >> interface to implement. >> >> Also, this should be clarified for the lookup function related APIs. >> >> Best, >> Lincoln Lee >> >> >> Jingsong Li <jingsongl...@gmail.com> 于2022年6月1日周三 15:18写道: >> >>> Hi Lincoln, >>> >>> > It's better making decisions at the query level when a connector has >>> both >>> capabilities. >>> >>> Can you clarify the mechanism? >>> - only sync connector: What connector developers should do >>> - only async connector: What connector developers should do >>> - both async and sync connector: What connector developers should do >>> >>> Best, >>> Jingsong >>> >>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <lincoln.8...@gmail.com> >>> wrote: >>> >>> > Hi Jingsong, >>> > >>> > Thanks for your feedback! >>> > >>> > Yes, the existing HBase connector use an option 'lookup.async' to >>> control >>> > its lookup source implementations that exposed to the planner, however >>> it's >>> > a private option for the HBase connector only, so it will not affect >>> the >>> > common API. >>> > >>> > And as discussed in the mailing thread of FLIP-221[1], we got a >>> consensus >>> > that do not make it as a common option. It's better making decisions >>> at the >>> > query level when a connector has both capabilities. >>> > >>> > So if everything goes well, we should discuss it whether to deprecate >>> > the 'lookup.async' >>> > or not for HBase connector after the hint been done. >>> > >>> > This will be mentioned in the Compatibility part of this FLIP[2]. >>> > >>> > WDYT? >>> > >>> > [1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox >>> > [2]: >>> > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode >>> > >>> > Best, >>> > Lincoln Lee >>> > >>> > >>> > Jingsong Li <jingsongl...@gmail.com> 于2022年6月1日周三 14:11写道: >>> > >>> > > Hi Lincoln, >>> > > >>> > > The unified lookup hint is what I want. >>> > > >>> > > And I like 'async'='true|false' option. >>> > > >>> > > But there is a compatibility issue, as I remember if async is >>> currently >>> > > controlled by connector, and this may also require some API changes? >>> > > >>> > > We need to have a clear story for the connector combined with this >>> > option: >>> > > - only sync connector >>> > > - only async connector >>> > > - both async and sync connector >>> > > >>> > > Best, >>> > > Jingsong >>> > > >>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <lincoln.8...@gmail.com> >>> > > wrote: >>> > > >>> > > > Thanks Jark for your quick response and the consensus! >>> > > > >>> > > > And I will update the FLIP after Jingsong or other developers >>> confirm >>> > > that >>> > > > there is no problem. >>> > > > >>> > > > Best, >>> > > > Lincoln Lee >>> > > > >>> > > > >>> > > > Jark Wu <imj...@gmail.com> 于2022年5月30日周一 15:49写道: >>> > > > >>> > > > > Thanks for the update. >>> > > > > >>> > > > > The unified lookup hint looks good to me. >>> > > > > And thanks for explaining ALLOW_UNORDERED. >>> > > > > >>> > > > > Best, >>> > > > > Jark >>> > > > > >>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee < >>> lincoln.8...@gmail.com> >>> > > > wrote: >>> > > > > >>> > > > > > Hi Jark & Jingsong, >>> > > > > > >>> > > > > > Thanks for your feedback! >>> > > > > > >>> > > > > > 1.) support retry on sync lookup >>> > > > > > I also agree with supporting it, this will be useful for >>> connectors >>> > > > that >>> > > > > > don't have asynchronous lookup implementations and can also >>> solve >>> > the >>> > > > > ASYNC >>> > > > > > non-target problem to some extent(because the retrying is >>> blocking >>> > > for >>> > > > > sync >>> > > > > > lookup, and may accumulate delay, but it maybe acceptable for >>> the >>> > > case >>> > > > > that >>> > > > > > most or all data want to do a delayed lookup). >>> > > > > > >>> > > > > > For the api perspective, we can do some unification. Let's >>> think of >>> > > the >>> > > > > > whole user story for lookup join: >>> > > > > > >>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP can share a common one: LOOKUP by >>> > > > different >>> > > > > > hint option values: 'async'='true|false' >>> > > > > > >>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the >>> > > > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false' >>> > > > > > >>> > > > > > we can use one single hint LOOKUP with different hint options >>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to cover all >>> > > related >>> > > > > > functionalities. >>> > > > > > Compared to multiple hints with different subsets of >>> > functionality, a >>> > > > > > single hint may be easier for users to understand and use, and >>> > > specific >>> > > > > > parameters can be quickly found through documentation >>> > > > > > >>> > > > > > the support matrix will be: >>> > > > > > lookup support async retry >>> > > > > > sync w/o retry N N >>> > > > > > sync w/ retry N Y >>> > > > > > async w/o retry Y N >>> > > > > > async w/ retry Y Y >>> > > > > > >>> > > > > > and the available hint options for each mode: >>> > > > > > mode support hint options >>> > > > > > async async'='true' >>> > > > > > 'output-mode'='ordere|allow-unordered' >>> > > > > > 'capacity'='100' >>> > > > > > 'timeout'='180s' >>> > > > > > retry miss-retry'='true' >>> > > > > > 'retry-strategy'='fixed-delay' >>> > > > > > 'delay'='10s' >>> > > > > > 'max-attempts'='3' >>> > > > > > >>> > > > > > >>> > > > > > 2.) 'allow-unordered' vs 'unordered' for >>> > > > > > 'table.exec.async-lookup.output-mode' >>> > > > > > Yes, make it align with DataStream Api maybe more intuitive, >>> but >>> > > > there's >>> > > > > > some difference in table layer that makes the 'allow-unordered' >>> > > > > meaningful: >>> > > > > > updates in the pipeline need to be processed in order, >>> > > ALLOW_UNORDERED >>> > > > > > means if users allow unordered result, it will attempt to use >>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect >>> the >>> > > > > > correctness of the result, otherwise ORDERED will be still >>> used. >>> > > > > > >>> > > > > > Another choice is that when the user specifies unordered mode, >>> > > planner >>> > > > > > throws an error when it finds that it may affect correctness. >>> But >>> > > this >>> > > > is >>> > > > > > not user-friendly and is not consistent with the customary >>> > treatment >>> > > of >>> > > > > > invalid query hints(best effort). >>> > > > > > >>> > > > > > I opened a pr https://github.com/apache/flink/pull/19759 for >>> the >>> > new >>> > > > > > option >>> > > > > > 'table.exec.async-lookup.output-mode' and also a discussion on >>> > > > > FLINK-27625: >>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the >>> > changes >>> > > > > were >>> > > > > > relatively minor, no new flip was created) >>> > > > > > >>> > > > > > If we can reach a consensus on the single unified hint, e.g., >>> > LOOKUP, >>> > > > > then >>> > > > > > FLINK-27625 can be covered. >>> > > > > > >>> > > > > > WDYT? >>> > > > > > >>> > > > > > Best, >>> > > > > > Lincoln Lee >>> > > > > > >>> > > > > > >>> > > > > > Jark Wu <imj...@gmail.com> 于2022年5月27日周五 21:04写道: >>> > > > > > >>> > > > > > > Hi Lincoln, >>> > > > > > > >>> > > > > > > Delayed Dim Join is a frequently requested feature, it's >>> exciting >>> > > to >>> > > > > see >>> > > > > > > this feature is on the road. >>> > > > > > > The FLIP looks good to me in general. I only left some minor >>> > > > comments. >>> > > > > > > >>> > > > > > > 1) support retry for sync lookup >>> > > > > > > I'm also fine with the idea proposed by Jingsong. But this >>> > doesn't >>> > > > > > conflict >>> > > > > > > with the FLIP and can >>> > > > > > > be future work. It would be great if we can determine the >>> APIs >>> > > first. >>> > > > > > > >>> > > > > > > 1) "allow-unordered" => "unordered" >>> > > > > > > I would prefer the "unordered" output mode rather than >>> > > > > "allow-unordered". >>> > > > > > > Because this fully aligns with the DataStream behaviors and >>> > avoids >>> > > > > > > confusion on the differences. >>> > > > > > > I understand the purpose that adding a "allow" prefix here, >>> but I >>> > > > think >>> > > > > > the >>> > > > > > > semantic is fine to just >>> > > > > > > use "unordered" here. We didn't see any users confused about >>> > > > > > > OutputMode#UNORDERED. >>> > > > > > > >>> > > > > > > Best, >>> > > > > > > Jark >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li < >>> > jingsongl...@gmail.com> >>> > > > > > wrote: >>> > > > > > > >>> > > > > > > > Thanks Lincoln for your proposal. >>> > > > > > > > >>> > > > > > > > Take a look at `strategy: fixed-delay delay: duration, >>> e.g., >>> > 10s >>> > > > > > > > max-attempts: integer, e.g., 3`. >>> > > > > > > > >>> > > > > > > > Are these options only for async? It looks like normal >>> lookups >>> > > work >>> > > > > > too? >>> > > > > > > > >>> > > > > > > > One thing is: most of the lookup functions seem to be >>> > synchronous >>> > > > > now? >>> > > > > > > > There are not so many asynchronous ones? >>> > > > > > > > >>> > > > > > > > Best, >>> > > > > > > > Jingsong >>> > > > > > > > >>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee < >>> > > > lincoln.8...@gmail.com >>> > > > > > >>> > > > > > > > wrote: >>> > > > > > > > >>> > > > > > > > > Hi all, >>> > > > > > > > > >>> > > > > > > > > Considering the new common table option >>> 'lookup.max-retries' >>> > > > > proposed >>> > > > > > > in >>> > > > > > > > > FLIP-221[1] which is commonly used for exception >>> handling in >>> > > > > > connector >>> > > > > > > > > implementation, we should clearly distinguish >>> > > ASYNC_LOOKUP_RETRY >>> > > > > from >>> > > > > > > it >>> > > > > > > > to >>> > > > > > > > > avoid confusing users. >>> > > > > > > > > >>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to >>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY, and as the name implies, >>> restrict >>> > it >>> > > to >>> > > > > > > support >>> > > > > > > > > retries only for lookup misses and no longer include >>> > exceptions >>> > > > > (for >>> > > > > > > sql >>> > > > > > > > > connectors, let the connector implementer decide how to >>> > handle >>> > > > > > > exceptions >>> > > > > > > > > since there are various kinds of retryable exceptions >>> and can >>> > > not >>> > > > > > retry >>> > > > > > > > > ones). >>> > > > > > > > > >>> > > > > > > > > The FLIP[2] has been updated. >>> > > > > > > > > >>> > > > > > > > > [1] >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric >>> > > > > > > > > [2] >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > Best, >>> > > > > > > > > Lincoln Lee >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > Lincoln Lee <lincoln.8...@gmail.com> 于2022年5月19日周四 >>> 18:24写道: >>> > > > > > > > > >>> > > > > > > > > > Dear Flink developers, >>> > > > > > > > > > >>> > > > > > > > > > I would like to open a discussion on FLIP 234 [1] to >>> > support >>> > > > > > > retryable >>> > > > > > > > > > lookup join to solve delayed updates issue, as a >>> pre-work >>> > for >>> > > > > this >>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a generic >>> > retry >>> > > > > > support >>> > > > > > > > for >>> > > > > > > > > > Async I/O. >>> > > > > > > > > > We prefer to offer this retry capability via query >>> hints, >>> > > > similar >>> > > > > > to >>> > > > > > > > new >>> > > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4]. >>> > > > > > > > > > >>> > > > > > > > > > This feature is backwards compatible and transparently >>> to >>> > > > > > connectors. >>> > > > > > > > For >>> > > > > > > > > > existing connectors which implements >>> AsyncTableFunction, >>> > can >>> > > > > easily >>> > > > > > > > > enable >>> > > > > > > > > > async retry via the new join hint. >>> > > > > > > > > > >>> > > > > > > > > > [1] >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems >>> > > > > > > > > > [2] >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963 >>> > > > > > > > > > [3] >>> > > > > > >>> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8 >>> > > > > > > > > > [4] >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join >>> > > > > > > > > > >>> > > > > > > > > > Best, >>> > > > > > > > > > Lincoln Lee >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >>