Thanks for the update. The unified lookup hint looks good to me. And thanks for explaining ALLOW_UNORDERED.
Best, Jark On Mon, 30 May 2022 at 15:31, Lincoln Lee <lincoln.8...@gmail.com> wrote: > Hi Jark & Jingsong, > > Thanks for your feedback! > > 1.) support retry on sync lookup > I also agree with supporting it, this will be useful for connectors that > don't have asynchronous lookup implementations and can also solve the ASYNC > non-target problem to some extent(because the retrying is blocking for sync > lookup, and may accumulate delay, but it maybe acceptable for the case that > most or all data want to do a delayed lookup). > > For the api perspective, we can do some unification. Let's think of the > whole user story for lookup join: > > ASYNC_LOOKUP vs SYNC_LOOKUP can share a common one: LOOKUP by different > hint option values: 'async'='true|false' > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false' > > we can use one single hint LOOKUP with different hint options > ('async'='true|false', 'miss-retry'='true|false') to cover all related > functionalities. > Compared to multiple hints with different subsets of functionality, a > single hint may be easier for users to understand and use, and specific > parameters can be quickly found through documentation > > the support matrix will be: > lookup support async retry > sync w/o retry N N > sync w/ retry N Y > async w/o retry Y N > async w/ retry Y Y > > and the available hint options for each mode: > mode support hint options > async async'='true' > 'output-mode'='ordere|allow-unordered' > 'capacity'='100' > 'timeout'='180s' > retry miss-retry'='true' > 'retry-strategy'='fixed-delay' > 'delay'='10s' > 'max-attempts'='3' > > > 2.) 'allow-unordered' vs 'unordered' for > 'table.exec.async-lookup.output-mode' > Yes, make it align with DataStream Api maybe more intuitive, but there's > some difference in table layer that makes the 'allow-unordered' meaningful: > updates in the pipeline need to be processed in order, ALLOW_UNORDERED > means if users allow unordered result, it will attempt to use > AsyncDataStream.OutputMode.UNORDERED when it does not affect the > correctness of the result, otherwise ORDERED will be still used. > > Another choice is that when the user specifies unordered mode, planner > throws an error when it finds that it may affect correctness. But this is > not user-friendly and is not consistent with the customary treatment of > invalid query hints(best effort). > > I opened a pr https://github.com/apache/flink/pull/19759 for the new > option > 'table.exec.async-lookup.output-mode' and also a discussion on FLINK-27625: > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the changes were > relatively minor, no new flip was created) > > If we can reach a consensus on the single unified hint, e.g., LOOKUP, then > FLINK-27625 can be covered. > > WDYT? > > Best, > Lincoln Lee > > > Jark Wu <imj...@gmail.com> 于2022年5月27日周五 21:04写道: > > > Hi Lincoln, > > > > Delayed Dim Join is a frequently requested feature, it's exciting to see > > this feature is on the road. > > The FLIP looks good to me in general. I only left some minor comments. > > > > 1) support retry for sync lookup > > I'm also fine with the idea proposed by Jingsong. But this doesn't > conflict > > with the FLIP and can > > be future work. It would be great if we can determine the APIs first. > > > > 1) "allow-unordered" => "unordered" > > I would prefer the "unordered" output mode rather than "allow-unordered". > > Because this fully aligns with the DataStream behaviors and avoids > > confusion on the differences. > > I understand the purpose that adding a "allow" prefix here, but I think > the > > semantic is fine to just > > use "unordered" here. We didn't see any users confused about > > OutputMode#UNORDERED. > > > > Best, > > Jark > > > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <jingsongl...@gmail.com> > wrote: > > > > > Thanks Lincoln for your proposal. > > > > > > Take a look at `strategy: fixed-delay delay: duration, e.g., 10s > > > max-attempts: integer, e.g., 3`. > > > > > > Are these options only for async? It looks like normal lookups work > too? > > > > > > One thing is: most of the lookup functions seem to be synchronous now? > > > There are not so many asynchronous ones? > > > > > > Best, > > > Jingsong > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <lincoln.8...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > Considering the new common table option 'lookup.max-retries' proposed > > in > > > > FLIP-221[1] which is commonly used for exception handling in > connector > > > > implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY from > > it > > > to > > > > avoid confusing users. > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to > > > > ASYNC_LOOKUP_MISS_RETRY, and as the name implies, restrict it to > > support > > > > retries only for lookup misses and no longer include exceptions (for > > sql > > > > connectors, let the connector implementer decide how to handle > > exceptions > > > > since there are various kinds of retryable exceptions and can not > retry > > > > ones). > > > > > > > > The FLIP[2] has been updated. > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric > > > > [2] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems > > > > > > > > > > > > Best, > > > > Lincoln Lee > > > > > > > > > > > > Lincoln Lee <lincoln.8...@gmail.com> 于2022年5月19日周四 18:24写道: > > > > > > > > > Dear Flink developers, > > > > > > > > > > I would like to open a discussion on FLIP 234 [1] to support > > retryable > > > > > lookup join to solve delayed updates issue, as a pre-work for this > > > > > solution, we proposed FLIP-232[2] which adds a generic retry > support > > > for > > > > > Async I/O. > > > > > We prefer to offer this retry capability via query hints, similar > to > > > new > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4]. > > > > > > > > > > This feature is backwards compatible and transparently to > connectors. > > > For > > > > > existing connectors which implements AsyncTableFunction, can easily > > > > enable > > > > > async retry via the new join hint. > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems > > > > > [2] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963 > > > > > [3] > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8 > > > > > [4] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join > > > > > > > > > > Best, > > > > > Lincoln Lee > > > > > > > > > > > > > > >