Hi Lincoln,

> It's better making decisions at the query level when a connector has both
capabilities.

Can you clarify the mechanism?
- only sync connector: What connector developers should do
- only async connector: What connector developers should do
- both async and sync connector: What connector developers should do

Best,
Jingsong

On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <lincoln.8...@gmail.com> wrote:

> Hi Jingsong,
>
> Thanks for your feedback!
>
> Yes, the existing HBase connector use an option 'lookup.async' to control
> its lookup source implementations that exposed to the planner, however it's
> a private option for the HBase connector only, so it will not affect the
> common API.
>
> And as discussed in the mailing thread of FLIP-221[1], we got a consensus
> that do not make it as a common option. It's better making decisions at the
> query level when a connector has both capabilities.
>
> So if everything goes well, we should discuss it whether to deprecate
> the 'lookup.async'
> or not for HBase connector after the hint been done.
>
> This will be mentioned in the Compatibility part of this FLIP[2].
>
> WDYT?
>
> [1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> [2]:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
>
> Best,
> Lincoln Lee
>
>
> Jingsong Li <jingsongl...@gmail.com> 于2022年6月1日周三 14:11写道:
>
> > Hi Lincoln,
> >
> > The unified lookup hint is what I want.
> >
> > And I like 'async'='true|false' option.
> >
> > But there is a compatibility issue, as I remember if async is currently
> > controlled by connector, and this may also require some API changes?
> >
> > We need to have a clear story for the connector combined with this
> option:
> > - only sync connector
> > - only async connector
> > - both async and sync connector
> >
> > Best,
> > Jingsong
> >
> > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <lincoln.8...@gmail.com>
> > wrote:
> >
> > > Thanks Jark for your quick response and the consensus!
> > >
> > > And I will update the FLIP after Jingsong or other developers confirm
> > that
> > > there is no problem.
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jark Wu <imj...@gmail.com> 于2022年5月30日周一 15:49写道:
> > >
> > > > Thanks for the update.
> > > >
> > > > The unified lookup hint looks good to me.
> > > > And thanks for explaining ALLOW_UNORDERED.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <lincoln.8...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jark & Jingsong,
> > > > >
> > > > > Thanks for your feedback!
> > > > >
> > > > > 1.) support retry on sync lookup
> > > > > I also agree with supporting it, this will be useful for connectors
> > > that
> > > > > don't have asynchronous lookup implementations and can also solve
> the
> > > > ASYNC
> > > > > non-target problem to some extent(because the retrying is blocking
> > for
> > > > sync
> > > > > lookup, and may accumulate delay, but it maybe acceptable for the
> > case
> > > > that
> > > > > most or all data want to do a delayed lookup).
> > > > >
> > > > > For the api perspective, we can do some unification. Let's think of
> > the
> > > > > whole user story for lookup join:
> > > > >
> > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
> > > different
> > > > > hint option values: 'async'='true|false'
> > > > >
> > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
> > > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
> > > > >
> > > > > we can use one single hint LOOKUP with different hint options
> > > > > ('async'='true|false', 'miss-retry'='true|false') to cover all
> > related
> > > > > functionalities.
> > > > > Compared to multiple hints with different subsets of
> functionality, a
> > > > > single hint may be easier for users to understand and use, and
> > specific
> > > > > parameters can be quickly found through documentation
> > > > >
> > > > > the support matrix will be:
> > > > > lookup support async retry
> > > > > sync w/o retry N N
> > > > > sync w/ retry N Y
> > > > > async w/o retry Y N
> > > > > async w/ retry Y Y
> > > > >
> > > > > and the available hint options for each mode:
> > > > > mode support hint options
> > > > > async async'='true'
> > > > > 'output-mode'='ordere|allow-unordered'
> > > > > 'capacity'='100'
> > > > > 'timeout'='180s'
> > > > > retry miss-retry'='true'
> > > > > 'retry-strategy'='fixed-delay'
> > > > > 'delay'='10s'
> > > > > 'max-attempts'='3'
> > > > >
> > > > >
> > > > > 2.) 'allow-unordered' vs 'unordered' for
> > > > > 'table.exec.async-lookup.output-mode'
> > > > > Yes, make it align with DataStream Api maybe more intuitive, but
> > > there's
> > > > > some difference in table layer that makes the 'allow-unordered'
> > > > meaningful:
> > > > > updates in the pipeline need to be processed in order,
> > ALLOW_UNORDERED
> > > > > means if users allow unordered result, it will attempt to use
> > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect the
> > > > > correctness of the result, otherwise ORDERED will be still used.
> > > > >
> > > > > Another choice is that when the user specifies unordered mode,
> > planner
> > > > > throws an error when it finds that it may affect correctness. But
> > this
> > > is
> > > > > not user-friendly and is not consistent with the customary
> treatment
> > of
> > > > > invalid query hints(best effort).
> > > > >
> > > > > I opened a pr https://github.com/apache/flink/pull/19759 for the
> new
> > > > > option
> > > > > 'table.exec.async-lookup.output-mode' and also a discussion on
> > > > FLINK-27625:
> > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the
> changes
> > > > were
> > > > > relatively minor, no new flip was created)
> > > > >
> > > > > If we can reach a consensus on the single unified hint, e.g.,
> LOOKUP,
> > > > then
> > > > > FLINK-27625 can be covered.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Jark Wu <imj...@gmail.com> 于2022年5月27日周五 21:04写道:
> > > > >
> > > > > > Hi Lincoln,
> > > > > >
> > > > > > Delayed Dim Join is a frequently requested feature, it's exciting
> > to
> > > > see
> > > > > > this feature is on the road.
> > > > > > The FLIP looks good to me in general. I only left some minor
> > > comments.
> > > > > >
> > > > > > 1) support retry for sync lookup
> > > > > > I'm also fine with the idea proposed by Jingsong. But this
> doesn't
> > > > > conflict
> > > > > > with the FLIP and can
> > > > > > be future work. It would be great if we can determine the APIs
> > first.
> > > > > >
> > > > > > 1) "allow-unordered" => "unordered"
> > > > > > I would prefer the "unordered" output mode rather than
> > > > "allow-unordered".
> > > > > > Because this fully aligns with the DataStream behaviors and
> avoids
> > > > > > confusion on the differences.
> > > > > > I understand the purpose that adding a "allow" prefix here, but I
> > > think
> > > > > the
> > > > > > semantic is fine to just
> > > > > > use "unordered" here. We didn't see any users confused about
> > > > > > OutputMode#UNORDERED.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> jingsongl...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks Lincoln for your proposal.
> > > > > > >
> > > > > > > Take a look at `strategy: fixed-delay delay: duration, e.g.,
> 10s
> > > > > > > max-attempts: integer, e.g., 3`.
> > > > > > >
> > > > > > > Are these options only for async? It looks like normal lookups
> > work
> > > > > too?
> > > > > > >
> > > > > > > One thing is: most of the lookup functions seem to be
> synchronous
> > > > now?
> > > > > > > There are not so many asynchronous ones?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > > lincoln.8...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > Considering the new common table option 'lookup.max-retries'
> > > > proposed
> > > > > > in
> > > > > > > > FLIP-221[1] which is commonly used for exception handling in
> > > > > connector
> > > > > > > > implementation, we should clearly distinguish
> > ASYNC_LOOKUP_RETRY
> > > > from
> > > > > > it
> > > > > > > to
> > > > > > > > avoid confusing users.
> > > > > > > >
> > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict
> it
> > to
> > > > > > support
> > > > > > > > retries only for lookup misses and no longer include
> exceptions
> > > > (for
> > > > > > sql
> > > > > > > > connectors, let the connector implementer decide how to
> handle
> > > > > > exceptions
> > > > > > > > since there are various kinds of retryable exceptions and can
> > not
> > > > > retry
> > > > > > > > ones).
> > > > > > > >
> > > > > > > > The FLIP[2] has been updated.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Lincoln Lee
> > > > > > > >
> > > > > > > >
> > > > > > > > Lincoln Lee <lincoln.8...@gmail.com> 于2022年5月19日周四 18:24写道:
> > > > > > > >
> > > > > > > > > Dear Flink developers,
> > > > > > > > >
> > > > > > > > > I would like to open a discussion on FLIP 234 [1] to
> support
> > > > > > retryable
> > > > > > > > > lookup join to solve delayed updates issue, as a pre-work
> for
> > > > this
> > > > > > > > > solution, we proposed FLIP-232[2] which adds a generic
> retry
> > > > > support
> > > > > > > for
> > > > > > > > > Async I/O.
> > > > > > > > > We prefer to offer this retry capability via query hints,
> > > similar
> > > > > to
> > > > > > > new
> > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > > > > > > >
> > > > > > > > > This feature is backwards compatible and transparently to
> > > > > connectors.
> > > > > > > For
> > > > > > > > > existing connectors which implements AsyncTableFunction,
> can
> > > > easily
> > > > > > > > enable
> > > > > > > > > async retry via the new join hint.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > > > > > [3]
> > > > > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > > > > > [4]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Lincoln Lee
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to