Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Jingsong Li Tue, 21 Jun 2022 20:17:24 -0700

Thanks Qingsheng and all.

I like this design.


Some comments:

1. LookupCache implements Serializable?

2. Minor: After FLIP-234 [1], there should be many connectors that
implement both PartialCachingLookupProvider and
PartialCachingAsyncLookupProvider. Can we extract a common interface
for `LookupCache getCache();` to ensure consistency?

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems

Best,
Jingsong

On Tue, Jun 21, 2022 at 4:09 PM Qingsheng Ren <re...@apache.org> wrote:
>
> Hi devs,
>
> I’d like to push FLIP-221 forward a little bit. Recently we had some offline 
> discussions and updated the FLIP. Here’s the diff compared to the previous 
> version:
>
> 1. (Async)LookupFunctionProvider is designed as a base interface for 
> constructing lookup functions.
> 2. From the LookupFunction we extend PartialCaching / 
> FullCachingLookupProvider for partial and full caching mode.
> 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full 
> caching mode, and provide 2 default implementations (Periodic / 
> TimedCacheReloadTrigger)
>
> Looking forward to your replies~
>
> Best,
> Qingsheng
>
> > On Jun 2, 2022, at 17:15, Qingsheng Ren <renqs...@gmail.com> wrote:
> >
> > Hi Becket,
> >
> > Thanks for your feedback!
> >
> > 1. An alternative way is to let the implementation of cache to decide
> > whether to store a missing key in the cache instead of the framework.
> > This sounds more reasonable and makes the LookupProvider interface
> > cleaner. I can update the FLIP and clarify in the JavaDoc of
> > LookupCache#put that the cache should decide whether to store an empty
> > collection.
> >
> > 2. Initially the builder pattern is for the extensibility of
> > LookupProvider interfaces that we could need to add more
> > configurations in the future. We can remove the builder now as we have
> > resolved the issue in 1. As for the builder in DefaultLookupCache I
> > prefer to keep it because we have a lot of arguments in the
> > constructor.
> >
> > 3. I think this might overturn the overall design. I agree with
> > Becket's idea that the API design should be layered considering
> > extensibility and it'll be great to have one unified interface
> > supporting both partial, full and even mixed custom strategies, but we
> > have some issues to resolve. The original purpose of treating full
> > caching separately is that we'd like to reuse the ability of
> > ScanRuntimeProvider. Developers just need to hand over Source /
> > SourceFunction / InputFormat so that the framework could be able to
> > compose the underlying topology and control the reload (maybe in a
> > distributed way). Under your design we leave the reload operation
> > totally to the CacheStrategy and I think it will be hard for
> > developers to reuse the source in the initializeCache method.
> >
> > Best regards,
> >
> > Qingsheng
> >
> > On Thu, Jun 2, 2022 at 1:50 PM Becket Qin <becket....@gmail.com> wrote:
> >>
> >> Thanks for updating the FLIP, Qingsheng. A few more comments:
> >>
> >> 1. I am still not sure about what is the use case for cacheMissingKey().
> >> More specifically, when would users want to have getCache() return a
> >> non-empty value and cacheMissingKey() returns false?
> >>
> >> 2. The builder pattern. Usually the builder pattern is used when there are
> >> a lot of variations of constructors. For example, if a class has three
> >> variables and all of them are optional, so there could potentially be many
> >> combinations of the variables. But in this FLIP, I don't see such case.
> >> What is the reason we have builders for all the classes?
> >>
> >> 3. Should the caching strategy be excluded from the top level provider API?
> >> Technically speaking, the Flink framework should only have two interfaces
> >> to deal with:
> >>    A) LookupFunction
> >>    B) AsyncLookupFunction
> >> Orthogonally, we *believe* there are two different strategies people can do
> >> caching. Note that the Flink framework does not care what is the caching
> >> strategy here.
> >>    a) partial caching
> >>    b) full caching
> >>
> >> Putting them together, we end up with 3 combinations that we think are
> >> valid:
> >>     Aa) PartialCachingLookupFunctionProvider
> >>     Ba) PartialCachingAsyncLookupFunctionProvider
> >>     Ab) FullCachingLookupFunctionProvider
> >>
> >> However, the caching strategy could actually be quite flexible. E.g. an
> >> initial full cache load followed by some partial updates. Also, I am not
> >> 100% sure if the full caching will always use ScanTableSource. Including
> >> the caching strategy in the top level provider API would make it harder to
> >> extend.
> >>
> >> One possible solution is to just have *LookupFunctionProvider* and
> >> *AsyncLookupFunctionProvider
> >> *as the top level API, both with a getCacheStrategy() method returning an
> >> optional CacheStrategy. The CacheStrategy class would have the following
> >> methods:
> >> 1. void open(Context), the context exposes some of the resources that may
> >> be useful for the the caching strategy, e.g. an ExecutorService that is
> >> synchronized with the data processing, or a cache refresh trigger which
> >> blocks data processing and refresh the cache.
> >> 2. void initializeCache(), a blocking method allows users to pre-populate
> >> the cache before processing any data if they wish.
> >> 3. void maybeCache(RowData key, Collection<RowData> value), blocking or
> >> non-blocking method.
> >> 4. void refreshCache(), a blocking / non-blocking method that is invoked by
> >> the Flink framework when the cache refresh trigger is pulled.
> >>
> >> In the above design, partial caching and full caching would be
> >> implementations of the CachingStrategy. And it is OK for users to implement
> >> their own CachingStrategy if they want to.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >> On Thu, Jun 2, 2022 at 12:14 PM Jark Wu <imj...@gmail.com> wrote:
> >>
> >>> Thank Qingsheng for the detailed summary and updates,
> >>>
> >>> The changes look good to me in general. I just have one minor improvement
> >>> comment.
> >>> Could we add a static util method to the "FullCachingReloadTrigger"
> >>> interface for quick usage?
> >>>
> >>> #periodicReloadAtFixedRate(Duration)
> >>> #periodicReloadWithFixedDelay(Duration)
> >>>
> >>> I think we can also do this for LookupCache. Because users may not know
> >>> where is the default
> >>> implementations and how to use them.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, 1 Jun 2022 at 18:32, Qingsheng Ren <renqs...@gmail.com> wrote:
> >>>
> >>>> Hi Jingsong,
> >>>>
> >>>> Thanks for your comments!
> >>>>
> >>>>> AllCache definition is not flexible, for example, PartialCache can use
> >>>> any custom storage, while the AllCache can not, AllCache can also be
> >>>> considered to store memory or disk, also need a flexible strategy.
> >>>>
> >>>> We had an offline discussion with Jark and Leonard. Basically we think
> >>>> exposing the interface of full cache storage to connector developers
> >>> might
> >>>> limit our future optimizations. The storage of full caching shouldn’t
> >>> have
> >>>> too many variations for different lookup tables so making it pluggable
> >>>> might not help a lot. Also I think it is not quite easy for connector
> >>>> developers to implement such an optimized storage. We can keep optimizing
> >>>> this storage in the future and all full caching lookup tables would
> >>> benefit
> >>>> from this.
> >>>>
> >>>>> We are more inclined to deprecate the connector `async` option when
> >>>> discussing FLIP-234. Can we remove this option from this FLIP?
> >>>>
> >>>> Thanks for the reminder! This option has been removed in the latest
> >>>> version.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Qingsheng
> >>>>
> >>>>
> >>>>> On Jun 1, 2022, at 15:28, Jingsong Li <jingsongl...@gmail.com> wrote:
> >>>>>
> >>>>> Thanks Alexander for your reply. We can discuss the new interface when
> >>> it
> >>>>> comes out.
> >>>>>
> >>>>> We are more inclined to deprecate the connector `async` option when
> >>>>> discussing FLIP-234 [1]. We should use hint to let planner decide.
> >>>>> Although the discussion has not yet produced a conclusion, can we
> >>> remove
> >>>>> this option from this FLIP? It doesn't seem to be related to this FLIP,
> >>>> but
> >>>>> more to FLIP-234, and we can form a conclusion over there.
> >>>>>
> >>>>> [1] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> >>>>>
> >>>>> Best,
> >>>>> Jingsong
> >>>>>
> >>>>> On Wed, Jun 1, 2022 at 4:59 AM Jing Ge <j...@ververica.com> wrote:
> >>>>>
> >>>>>> Hi Jark,
> >>>>>>
> >>>>>> Thanks for clarifying it. It would be fine. as long as we could
> >>> provide
> >>>> the
> >>>>>> no-cache solution. I was just wondering if the client side cache could
> >>>>>> really help when HBase is used, since the data to look up should be
> >>>> huge.
> >>>>>> Depending how much data will be cached on the client side, the data
> >>> that
> >>>>>> should be lru in e.g. LruBlockCache will not be lru anymore. In the
> >>>> worst
> >>>>>> case scenario, once the cached data at client side is expired, the
> >>>> request
> >>>>>> will hit disk which will cause extra latency temporarily, if I am not
> >>>>>> mistaken.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Jing
> >>>>>>
> >>>>>> On Mon, May 30, 2022 at 9:59 AM Jark Wu <imj...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Jing Ge,
> >>>>>>>
> >>>>>>> What do you mean about the "impact on the block cache used by HBase"?
> >>>>>>> In my understanding, the connector cache and HBase cache are totally
> >>>> two
> >>>>>>> things.
> >>>>>>> The connector cache is a local/client cache, and the HBase cache is a
> >>>>>>> server cache.
> >>>>>>>
> >>>>>>>> does it make sense to have a no-cache solution as one of the
> >>>>>>> default solutions so that customers will have no effort for the
> >>>> migration
> >>>>>>> if they want to stick with Hbase cache
> >>>>>>>
> >>>>>>> The implementation migration should be transparent to users. Take the
> >>>>>> HBase
> >>>>>>> connector as
> >>>>>>> an example,  it already supports lookup cache but is disabled by
> >>>> default.
> >>>>>>> After migration, the
> >>>>>>> connector still disables cache by default (i.e. no-cache solution).
> >>> No
> >>>>>>> migration effort for users.
> >>>>>>>
> >>>>>>> HBase cache and connector cache are two different things. HBase cache
> >>>>>> can't
> >>>>>>> simply replace
> >>>>>>> connector cache. Because one of the most important usages for
> >>> connector
> >>>>>>> cache is reducing
> >>>>>>> the I/O request/response and improving the throughput, which can
> >>>> achieve
> >>>>>>> by just using a server cache.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, 27 May 2022 at 22:42, Jing Ge <j...@ververica.com> wrote:
> >>>>>>>
> >>>>>>>> Thanks all for the valuable discussion. The new feature looks very
> >>>>>>>> interesting.
> >>>>>>>>
> >>>>>>>> According to the FLIP description: "*Currently we have JDBC, Hive
> >>> and
> >>>>>>> HBase
> >>>>>>>> connector implemented lookup table source. All existing
> >>>> implementations
> >>>>>>>> will be migrated to the current design and the migration will be
> >>>>>>>> transparent to end users*." I was only wondering if we should pay
> >>>>>>> attention
> >>>>>>>> to HBase and similar DBs. Since, commonly, the lookup data will be
> >>>> huge
> >>>>>>>> while using HBase, partial caching will be used in this case, if I
> >>> am
> >>>>>> not
> >>>>>>>> mistaken, which might have an impact on the block cache used by
> >>> HBase,
> >>>>>>> e.g.
> >>>>>>>> LruBlockCache.
> >>>>>>>> Another question is that, since HBase provides a sophisticated cache
> >>>>>>>> solution, does it make sense to have a no-cache solution as one of
> >>> the
> >>>>>>>> default solutions so that customers will have no effort for the
> >>>>>> migration
> >>>>>>>> if they want to stick with Hbase cache?
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Jing
> >>>>>>>>
> >>>>>>>> On Fri, May 27, 2022 at 11:19 AM Jingsong Li <
> >>> jingsongl...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I think the problem now is below:
> >>>>>>>>> 1. AllCache and PartialCache interface on the non-uniform, one
> >>> needs
> >>>>>> to
> >>>>>>>>> provide LookupProvider, the other needs to provide CacheBuilder.
> >>>>>>>>> 2. AllCache definition is not flexible, for example, PartialCache
> >>> can
> >>>>>>> use
> >>>>>>>>> any custom storage, while the AllCache can not, AllCache can also
> >>> be
> >>>>>>>>> considered to store memory or disk, also need a flexible strategy.
> >>>>>>>>> 3. AllCache can not customize ReloadStrategy, currently only
> >>>>>>>>> ScheduledReloadStrategy.
> >>>>>>>>>
> >>>>>>>>> In order to solve the above problems, the following are my ideas.
> >>>>>>>>>
> >>>>>>>>> ## Top level cache interfaces:
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> public interface CacheLookupProvider extends
> >>>>>>>>> LookupTableSource.LookupRuntimeProvider {
> >>>>>>>>>
> >>>>>>>>>   CacheBuilder createCacheBuilder();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> public interface CacheBuilder {
> >>>>>>>>>   Cache create();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> public interface Cache {
> >>>>>>>>>
> >>>>>>>>>   /**
> >>>>>>>>>    * Returns the value associated with key in this cache, or null
> >>>>>> if
> >>>>>>>>> there is no cached value for
> >>>>>>>>>    * key.
> >>>>>>>>>    */
> >>>>>>>>>   @Nullable
> >>>>>>>>>   Collection<RowData> getIfPresent(RowData key);
> >>>>>>>>>
> >>>>>>>>>   /** Returns the number of key-value mappings in the cache. */
> >>>>>>>>>   long size();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> ## Partial cache
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> public interface PartialCacheLookupFunction extends
> >>>>>>> CacheLookupProvider {
> >>>>>>>>>
> >>>>>>>>>   @Override
> >>>>>>>>>   PartialCacheBuilder createCacheBuilder();
> >>>>>>>>>
> >>>>>>>>> /** Creates an {@link LookupFunction} instance. */
> >>>>>>>>> LookupFunction createLookupFunction();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> public interface PartialCacheBuilder extends CacheBuilder {
> >>>>>>>>>
> >>>>>>>>>   PartialCache create();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> public interface PartialCache extends Cache {
> >>>>>>>>>
> >>>>>>>>>   /**
> >>>>>>>>>    * Associates the specified value rows with the specified key
> >>> row
> >>>>>>>>> in the cache. If the cache
> >>>>>>>>>    * previously contained value associated with the key, the old
> >>>>>>>>> value is replaced by the
> >>>>>>>>>    * specified value.
> >>>>>>>>>    *
> >>>>>>>>>    * @return the previous value rows associated with key, or null
> >>>>>> if
> >>>>>>>>> there was no mapping for key.
> >>>>>>>>>    * @param key - key row with which the specified value is to be
> >>>>>>>>> associated
> >>>>>>>>>    * @param value – value rows to be associated with the specified
> >>>>>>> key
> >>>>>>>>>    */
> >>>>>>>>>   Collection<RowData> put(RowData key, Collection<RowData> value);
> >>>>>>>>>
> >>>>>>>>>   /** Discards any cached value for the specified key. */
> >>>>>>>>>   void invalidate(RowData key);
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> ## All cache
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> public interface AllCacheLookupProvider extends
> >>> CacheLookupProvider {
> >>>>>>>>>
> >>>>>>>>>   void registerReloadStrategy(ScheduledExecutorService
> >>>>>>>>> executorService, Reloader reloader);
> >>>>>>>>>
> >>>>>>>>>   ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider();
> >>>>>>>>>
> >>>>>>>>>   @Override
> >>>>>>>>>   AllCacheBuilder createCacheBuilder();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> public interface AllCacheBuilder extends CacheBuilder {
> >>>>>>>>>
> >>>>>>>>>   AllCache create();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> public interface AllCache extends Cache {
> >>>>>>>>>
> >>>>>>>>>   void putAll(Iterator<Map<RowData, RowData>> allEntries);
> >>>>>>>>>
> >>>>>>>>>   void clearAll();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> public interface Reloader {
> >>>>>>>>>
> >>>>>>>>>   void reload();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> ```
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jingsong
> >>>>>>>>>
> >>>>>>>>> On Fri, May 27, 2022 at 11:10 AM Jingsong Li <
> >>> jingsongl...@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Qingsheng and all for your discussion.
> >>>>>>>>>>
> >>>>>>>>>> Very sorry to jump in so late.
> >>>>>>>>>>
> >>>>>>>>>> Maybe I missed something?
> >>>>>>>>>> My first impression when I saw the cache interface was, why don't
> >>>>>> we
> >>>>>>>>>> provide an interface similar to guava cache [1], on top of guava
> >>>>>>> cache,
> >>>>>>>>>> caffeine also makes extensions for asynchronous calls.[2]
> >>>>>>>>>> There is also the bulk load in caffeine too.
> >>>>>>>>>>
> >>>>>>>>>> I am also more confused why first from LookupCacheFactory.Builder
> >>>>>> and
> >>>>>>>>> then
> >>>>>>>>>> to Factory to create Cache.
> >>>>>>>>>>
> >>>>>>>>>> [1] https://github.com/google/guava
> >>>>>>>>>> [2] https://github.com/ben-manes/caffeine/wiki/Population
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jingsong
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 26, 2022 at 11:17 PM Jark Wu <imj...@gmail.com>
> >>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> After looking at the new introduced ReloadTime and Becket's
> >>>>>> comment,
> >>>>>>>>>>> I agree with Becket we should have a pluggable reloading
> >>> strategy.
> >>>>>>>>>>> We can provide some common implementations, e.g., periodic
> >>>>>>> reloading,
> >>>>>>>>> and
> >>>>>>>>>>> daily reloading.
> >>>>>>>>>>> But there definitely be some connector- or business-specific
> >>>>>>> reloading
> >>>>>>>>>>> strategies, e.g.
> >>>>>>>>>>> notify by a zookeeper watcher, reload once a new Hive partition
> >>> is
> >>>>>>>>>>> complete.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Jark
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, 26 May 2022 at 11:52, Becket Qin <becket....@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for updating the FLIP. A few comments / questions below:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. Is there a reason that we have both "XXXFactory" and
> >>>>>>>> "XXXProvider".
> >>>>>>>>>>>> What is the difference between them? If they are the same, can
> >>>>>> we
> >>>>>>>> just
> >>>>>>>>>>> use
> >>>>>>>>>>>> XXXFactory everywhere?
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Regarding the FullCachingLookupProvider, should the reloading
> >>>>>>>>> policy
> >>>>>>>>>>>> also be pluggable? Periodical reloading could be sometimes be
> >>>>>>> tricky
> >>>>>>>>> in
> >>>>>>>>>>>> practice. For example, if user uses 24 hours as the cache
> >>>>>> refresh
> >>>>>>>>>>> interval
> >>>>>>>>>>>> and some nightly batch job delayed, the cache update may still
> >>>>>> see
> >>>>>>>> the
> >>>>>>>>>>>> stale data.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3. In DefaultLookupCacheFactory, it looks like InitialCapacity
> >>>>>>>> should
> >>>>>>>>> be
> >>>>>>>>>>>> removed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 4. The purpose of LookupFunctionProvider#cacheMissingKey()
> >>>>>> seems a
> >>>>>>>>>>> little
> >>>>>>>>>>>> confusing to me. If Optional<LookupCacheFactory>
> >>>>>> getCacheFactory()
> >>>>>>>>>>> returns
> >>>>>>>>>>>> a non-empty factory, doesn't that already indicates the
> >>>>>> framework
> >>>>>>> to
> >>>>>>>>>>> cache
> >>>>>>>>>>>> the missing keys? Also, why is this method returning an
> >>>>>>>>>>> Optional<Boolean>
> >>>>>>>>>>>> instead of boolean?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, May 25, 2022 at 5:07 PM Qingsheng Ren <
> >>>>>> renqs...@gmail.com
> >>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Lincoln and Jark,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the comments! If the community reaches a consensus
> >>>>>>> that
> >>>>>>>> we
> >>>>>>>>>>> use
> >>>>>>>>>>>>> SQL hint instead of table options to decide whether to use sync
> >>>>>>> or
> >>>>>>>>>>> async
> >>>>>>>>>>>>> mode, it’s indeed not necessary to introduce the “lookup.async”
> >>>>>>>>> option.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think it’s a good idea to let the decision of async made on
> >>>>>>> query
> >>>>>>>>>>>>> level, which could make better optimization with more
> >>>>>> infomation
> >>>>>>>>>>> gathered
> >>>>>>>>>>>>> by planner. Is there any FLIP describing the issue in
> >>>>>>> FLINK-27625?
> >>>>>>>> I
> >>>>>>>>>>>>> thought FLIP-234 is only proposing adding SQL hint for retry on
> >>>>>>>>> missing
> >>>>>>>>>>>>> instead of the entire async mode to be controlled by hint.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On May 25, 2022, at 15:13, Lincoln Lee <
> >>>>>> lincoln.8...@gmail.com
> >>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Jark,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for your reply!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Currently 'lookup.async' just lies in HBase connector, I have
> >>>>>>> no
> >>>>>>>>> idea
> >>>>>>>>>>>>>> whether or when to remove it (we can discuss it in another
> >>>>>>> issue
> >>>>>>>>> for
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> HBase connector after FLINK-27625 is done), just not add it
> >>>>>>> into
> >>>>>>>> a
> >>>>>>>>>>>>> common
> >>>>>>>>>>>>>> option now.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Lincoln Lee
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jark Wu <imj...@gmail.com> 于2022年5月24日周二 20:14写道：
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Lincoln,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I have taken a look at FLIP-234, and I agree with you that
> >>>>>> the
> >>>>>>>>>>>>> connectors
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> provide both async and sync runtime providers simultaneously
> >>>>>>>>> instead
> >>>>>>>>>>>>> of one
> >>>>>>>>>>>>>>> of them.
> >>>>>>>>>>>>>>> At that point, "lookup.async" looks redundant. If this
> >>>>>> option
> >>>>>>> is
> >>>>>>>>>>>>> planned to
> >>>>>>>>>>>>>>> be removed
> >>>>>>>>>>>>>>> in the long term, I think it makes sense not to introduce it
> >>>>>>> in
> >>>>>>>>> this
> >>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, 24 May 2022 at 11:08, Lincoln Lee <
> >>>>>>>> lincoln.8...@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Sorry for jumping into the discussion so late. It's a good
> >>>>>>> idea
> >>>>>>>>>>> that
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>> have a common table option. I have a minor comments on
> >>>>>>>>>>> 'lookup.async'
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> not make it a common option:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The table layer abstracts both sync and async lookup
> >>>>>>>>> capabilities,
> >>>>>>>>>>>>>>>> connectors implementers can choose one or both, in the case
> >>>>>>> of
> >>>>>>>>>>>>>>> implementing
> >>>>>>>>>>>>>>>> only one capability(status of the most of existing builtin
> >>>>>>>>>>> connectors)
> >>>>>>>>>>>>>>>> 'lookup.async' will not be used.  And when a connector has
> >>>>>>> both
> >>>>>>>>>>>>>>>> capabilities, I think this choice is more suitable for
> >>>>>> making
> >>>>>>>>>>>>> decisions
> >>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>> the query level, for example, table planner can choose the
> >>>>>>>>> physical
> >>>>>>>>>>>>>>>> implementation of async lookup or sync lookup based on its
> >>>>>>> cost
> >>>>>>>>>>>>> model, or
> >>>>>>>>>>>>>>>> users can give query hint based on their own better
> >>>>>>>>>>> understanding.  If
> >>>>>>>>>>>>>>>> there is another common table option 'lookup.async', it may
> >>>>>>>>> confuse
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> users in the long run.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So, I prefer to leave the 'lookup.async' option in private
> >>>>>>>> place
> >>>>>>>>>>> (for
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> current hbase connector) and not turn it into a common
> >>>>>>> option.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Lincoln Lee
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年5月23日周一 14:54写道：
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Alexander,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks for the review! We recently updated the FLIP and
> >>>>>> you
> >>>>>>>> can
> >>>>>>>>>>> find
> >>>>>>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>> changes from my latest email. Since some terminologies has
> >>>>>>>>>>> changed so
> >>>>>>>>>>>>>>>> I’ll
> >>>>>>>>>>>>>>>>> use the new concept for replying your comments.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 1. Builder vs ‘of’
> >>>>>>>>>>>>>>>>> I’m OK to use builder pattern if we have additional
> >>>>>> optional
> >>>>>>>>>>>>> parameters
> >>>>>>>>>>>>>>>>> for full caching mode (“rescan” previously). The
> >>>>>>>>>>> schedule-with-delay
> >>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>> looks reasonable to me, but I think we need to redesign
> >>>>>> the
> >>>>>>>>>>> builder
> >>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> full caching to make it more descriptive for developers.
> >>>>>>> Would
> >>>>>>>>> you
> >>>>>>>>>>>>> mind
> >>>>>>>>>>>>>>>>> sharing your ideas about the API? For accessing the FLIP
> >>>>>>>>> workspace
> >>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> just provide your account ID and ping any PMC member
> >>>>>>> including
> >>>>>>>>>>> Jark.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2. Common table options
> >>>>>>>>>>>>>>>>> We have some discussions these days and propose to
> >>>>>>> introduce 8
> >>>>>>>>>>> common
> >>>>>>>>>>>>>>>>> table options about caching. It has been updated on the
> >>>>>>> FLIP.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 3. Retries
> >>>>>>>>>>>>>>>>> I think we are on the same page :-)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For your additional concerns:
> >>>>>>>>>>>>>>>>> 1) The table option has been updated.
> >>>>>>>>>>>>>>>>> 2) We got “lookup.cache” back for configuring whether to
> >>>>>> use
> >>>>>>>>>>> partial
> >>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>> full caching mode.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On May 19, 2022, at 17:25, Александр Смирнов <
> >>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Also I have a few additions:
> >>>>>>>>>>>>>>>>>> 1) maybe rename 'lookup.cache.maximum-size' to
> >>>>>>>>>>>>>>>>>> 'lookup.cache.max-rows'? I think it will be more clear
> >>>>>> that
> >>>>>>>> we
> >>>>>>>>>>> talk
> >>>>>>>>>>>>>>>>>> not about bytes, but about the number of rows. Plus it
> >>>>>> fits
> >>>>>>>>> more,
> >>>>>>>>>>>>>>>>>> considering my optimization with filters.
> >>>>>>>>>>>>>>>>>> 2) How will users enable rescanning? Are we going to
> >>>>>>> separate
> >>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>> and rescanning from the options point of view? Like
> >>>>>>> initially
> >>>>>>>>> we
> >>>>>>>>>>> had
> >>>>>>>>>>>>>>>>>> one option 'lookup.cache' with values LRU / ALL. I think
> >>>>>>> now
> >>>>>>>> we
> >>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>> make a boolean option 'lookup.rescan'. RescanInterval can
> >>>>>>> be
> >>>>>>>>>>>>>>>>>> 'lookup.rescan.interval', etc.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>> Alexander
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> чт, 19 мая 2022 г. в 14:50, Александр Смирнов <
> >>>>>>>>>>> smirale...@gmail.com
> >>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi Qingsheng and Jark,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. Builders vs 'of'
> >>>>>>>>>>>>>>>>>>> I understand that builders are used when we have
> >>>>>> multiple
> >>>>>>>>>>>>>>> parameters.
> >>>>>>>>>>>>>>>>>>> I suggested them because we could add parameters later.
> >>>>>> To
> >>>>>>>>>>> prevent
> >>>>>>>>>>>>>>>>>>> Builder for ScanRuntimeProvider from looking redundant I
> >>>>>>> can
> >>>>>>>>>>>>> suggest
> >>>>>>>>>>>>>>>>>>> one more config now - "rescanStartTime".
> >>>>>>>>>>>>>>>>>>> It's a time in UTC (LocalTime class) when the first
> >>>>>> reload
> >>>>>>>> of
> >>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>> starts. This parameter can be thought of as
> >>>>>> 'initialDelay'
> >>>>>>>>> (diff
> >>>>>>>>>>>>>>>>>>> between current time and rescanStartTime) in method
> >>>>>>>>>>>>>>>>>>> ScheduleExecutorService#scheduleWithFixedDelay [1] . It
> >>>>>>> can
> >>>>>>>> be
> >>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>> useful when the dimension table is updated by some other
> >>>>>>>>>>> scheduled
> >>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>> at a certain time. Or when the user simply wants a
> >>>>>> second
> >>>>>>>> scan
> >>>>>>>>>>>>>>> (first
> >>>>>>>>>>>>>>>>>>> cache reload) be delayed. This option can be used even
> >>>>>>>> without
> >>>>>>>>>>>>>>>>>>> 'rescanInterval' - in this case 'rescanInterval' will be
> >>>>>>> one
> >>>>>>>>>>> day.
> >>>>>>>>>>>>>>>>>>> If you are fine with this option, I would be very glad
> >>>>>> if
> >>>>>>>> you
> >>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>> give me access to edit FLIP page, so I could add it
> >>>>>> myself
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. Common table options
> >>>>>>>>>>>>>>>>>>> I also think that FactoryUtil would be overloaded by all
> >>>>>>>> cache
> >>>>>>>>>>>>>>>>>>> options. But maybe unify all suggested options, not only
> >>>>>>> for
> >>>>>>>>>>>>> default
> >>>>>>>>>>>>>>>>>>> cache? I.e. class 'LookupOptions', that unifies default
> >>>>>>>> cache
> >>>>>>>>>>>>>>> options,
> >>>>>>>>>>>>>>>>>>> rescan options, 'async', 'maxRetries'. WDYT?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 3. Retries
> >>>>>>>>>>>>>>>>>>> I'm fine with suggestion close to
> >>>>>>> RetryUtils#tryTimes(times,
> >>>>>>>>>>> call)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit-
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>> Alexander
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> ср, 18 мая 2022 г. в 16:04, Qingsheng Ren <
> >>>>>>>> renqs...@gmail.com
> >>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Jark and Alexander,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for your comments! I’m also OK to introduce
> >>>>>> common
> >>>>>>>>> table
> >>>>>>>>>>>>>>>>> options. I prefer to introduce a new
> >>>>>>> DefaultLookupCacheOptions
> >>>>>>>>>>> class
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> holding these option definitions because putting all
> >>>>>> options
> >>>>>>>>> into
> >>>>>>>>>>>>>>>>> FactoryUtil would make it a bit ”crowded” and not well
> >>>>>>>>>>> categorized.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> FLIP has been updated according to suggestions above:
> >>>>>>>>>>>>>>>>>>>> 1. Use static “of” method for constructing
> >>>>>>>>>>> RescanRuntimeProvider
> >>>>>>>>>>>>>>>>> considering both arguments are required.
> >>>>>>>>>>>>>>>>>>>> 2. Introduce new table options matching
> >>>>>>>>>>> DefaultLookupCacheFactory
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Wed, May 18, 2022 at 2:57 PM Jark Wu <
> >>>>>>> imj...@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 1) retry logic
> >>>>>>>>>>>>>>>>>>>>> I think we can extract some common retry logic into
> >>>>>>>>> utilities,
> >>>>>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>> RetryUtils#tryTimes(times, call).
> >>>>>>>>>>>>>>>>>>>>> This seems independent of this FLIP and can be reused
> >>>>>> by
> >>>>>>>>>>>>>>> DataStream
> >>>>>>>>>>>>>>>>> users.
> >>>>>>>>>>>>>>>>>>>>> Maybe we can open an issue to discuss this and where
> >>>>>> to
> >>>>>>>> put
> >>>>>>>>>>> it.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 2) cache ConfigOptions
> >>>>>>>>>>>>>>>>>>>>> I'm fine with defining cache config options in the
> >>>>>>>>> framework.
> >>>>>>>>>>>>>>>>>>>>> A candidate place to put is FactoryUtil which also
> >>>>>>>> includes
> >>>>>>>>>>>>>>>>> "sink.parallelism", "format" options.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, 18 May 2022 at 13:52, Александр Смирнов <
> >>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thank you for considering my comments.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> there might be custom logic before making retry,
> >>>>>> such
> >>>>>>> as
> >>>>>>>>>>>>>>>>> re-establish the connection
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Yes, I understand that. I meant that such logic can
> >>>>>> be
> >>>>>>>>>>> placed in
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> separate function, that can be implemented by
> >>>>>>> connectors.
> >>>>>>>>>>> Just
> >>>>>>>>>>>>>>>> moving
> >>>>>>>>>>>>>>>>>>>>>> the retry logic would make connector's LookupFunction
> >>>>>>>> more
> >>>>>>>>>>>>>>> concise
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>>>>>>>> avoid duplicate code. However, it's a minor change.
> >>>>>> The
> >>>>>>>>>>> decision
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> up
> >>>>>>>>>>>>>>>>>>>>>> to you.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
> >>>>>>>>>>> developers
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> define their own options as we do now per connector.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> What is the reason for that? One of the main goals of
> >>>>>>>> this
> >>>>>>>>>>> FLIP
> >>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> unify the configs, wasn't it? I understand that
> >>>>>> current
> >>>>>>>>> cache
> >>>>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>>>>>>> doesn't depend on ConfigOptions, like was before. But
> >>>>>>>> still
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> put
> >>>>>>>>>>>>>>>>>>>>>> these options into the framework, so connectors can
> >>>>>>> reuse
> >>>>>>>>>>> them
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> avoid code duplication, and, what is more
> >>>>>> significant,
> >>>>>>>>> avoid
> >>>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>>>>>>>>> different options naming. This moment can be pointed
> >>>>>>> out
> >>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>> documentation for connector developers.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>> Alexander
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> вт, 17 мая 2022 г. в 17:11, Qingsheng Ren <
> >>>>>>>>>>> renqs...@gmail.com>:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi Alexander,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks for the review and glad to see we are on the
> >>>>>>> same
> >>>>>>>>>>> page!
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>> think you forgot to cc the dev mailing list so I’m also
> >>>>>>>> quoting
> >>>>>>>>>>> your
> >>>>>>>>>>>>>>>> reply
> >>>>>>>>>>>>>>>>> under this email.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> We can add 'maxRetryTimes' option into this class
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> In my opinion the retry logic should be implemented
> >>>>>> in
> >>>>>>>>>>> lookup()
> >>>>>>>>>>>>>>>>> instead of in LookupFunction#eval(). Retrying is only
> >>>>>>>> meaningful
> >>>>>>>>>>>>> under
> >>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> specific retriable failures, and there might be custom
> >>>>>> logic
> >>>>>>>>>>> before
> >>>>>>>>>>>>>>>> making
> >>>>>>>>>>>>>>>>> retry, such as re-establish the connection
> >>>>>>>>>>> (JdbcRowDataLookupFunction
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>> example), so it's more handy to leave it to the connector.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I don't see DDL options, that were in previous
> >>>>>>> version
> >>>>>>>> of
> >>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>> Do
> >>>>>>>>>>>>>>>>> you have any special plans for them?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let
> >>>>>>>>>>> developers
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> define their own options as we do now per connector.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> The rest of comments sound great and I’ll update the
> >>>>>>>> FLIP.
> >>>>>>>>>>> Hope
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> can finalize our proposal soon!
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On May 17, 2022, at 13:46, Александр Смирнов <
> >>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and devs!
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I like the overall design of updated FLIP, however
> >>>>>> I
> >>>>>>>> have
> >>>>>>>>>>>>>>> several
> >>>>>>>>>>>>>>>>>>>>>>>> suggestions and questions.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 1) Introducing LookupFunction as a subclass of
> >>>>>>>>>>> TableFunction
> >>>>>>>>>>>>>>> is a
> >>>>>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>>>>>>>>>> idea. We can add 'maxRetryTimes' option into this
> >>>>>>>> class.
> >>>>>>>>>>>>> 'eval'
> >>>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>> of new LookupFunction is great for this purpose.
> >>>>>> The
> >>>>>>>> same
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> 'async' case.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 2) There might be other configs in future, such as
> >>>>>>>>>>>>>>>>> 'cacheMissingKey'
> >>>>>>>>>>>>>>>>>>>>>>>> in LookupFunctionProvider or 'rescanInterval' in
> >>>>>>>>>>>>>>>>> ScanRuntimeProvider.
> >>>>>>>>>>>>>>>>>>>>>>>> Maybe use Builder pattern in LookupFunctionProvider
> >>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> RescanRuntimeProvider for more flexibility (use one
> >>>>>>>>> 'build'
> >>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>> instead of many 'of' methods in future)?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 3) What are the plans for existing
> >>>>>>>> TableFunctionProvider
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> AsyncTableFunctionProvider? I think they should be
> >>>>>>>>>>> deprecated.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 4) Am I right that the current design does not
> >>>>>> assume
> >>>>>>>>>>> usage of
> >>>>>>>>>>>>>>>>>>>>>>>> user-provided LookupCache in re-scanning? In this
> >>>>>>> case,
> >>>>>>>>> it
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>>>>>>> clear why do we need methods such as 'invalidate'
> >>>>>> or
> >>>>>>>>>>> 'putAll'
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> LookupCache.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 5) I don't see DDL options, that were in previous
> >>>>>>>> version
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>> Do
> >>>>>>>>>>>>>>>>>>>>>>>> you have any special plans for them?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> If you don't mind, I would be glad to be able to
> >>>>>> make
> >>>>>>>>> small
> >>>>>>>>>>>>>>>>>>>>>>>> adjustments to the FLIP document too. I think it's
> >>>>>>>> worth
> >>>>>>>>>>>>>>>> mentioning
> >>>>>>>>>>>>>>>>>>>>>>>> about what exactly optimizations are planning in
> >>>>>> the
> >>>>>>>>>>> future.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> пт, 13 мая 2022 г. в 20:27, Qingsheng Ren <
> >>>>>>>>>>> renqs...@gmail.com
> >>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and devs,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for the in-depth discussion!
> >>>>>> As
> >>>>>>>> Jark
> >>>>>>>>>>>>>>>>> mentioned we were inspired by Alexander's idea and made a
> >>>>>>>>>>> refactor on
> >>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>> design. FLIP-221 [1] has been updated to reflect our
> >>>>>> design
> >>>>>>>> now
> >>>>>>>>>>> and
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>> happy to hear more suggestions from you!
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Compared to the previous design:
> >>>>>>>>>>>>>>>>>>>>>>>>> 1. The lookup cache serves at table runtime level
> >>>>>>> and
> >>>>>>>> is
> >>>>>>>>>>>>>>>>> integrated as a component of LookupJoinRunner as discussed
> >>>>>>>>>>>>> previously.
> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Interfaces are renamed and re-designed to
> >>>>>> reflect
> >>>>>>>> the
> >>>>>>>>>>> new
> >>>>>>>>>>>>>>>>> design.
> >>>>>>>>>>>>>>>>>>>>>>>>> 3. We separate the all-caching case individually
> >>>>>> and
> >>>>>>>>>>>>>>> introduce a
> >>>>>>>>>>>>>>>>> new RescanRuntimeProvider to reuse the ability of
> >>>>>> scanning.
> >>>>>>> We
> >>>>>>>>> are
> >>>>>>>>>>>>>>>> planning
> >>>>>>>>>>>>>>>>> to support SourceFunction / InputFormat for now
> >>>>>> considering
> >>>>>>>> the
> >>>>>>>>>>>>>>>> complexity
> >>>>>>>>>>>>>>>>> of FLIP-27 Source API.
> >>>>>>>>>>>>>>>>>>>>>>>>> 4. A new interface LookupFunction is introduced to
> >>>>>>>> make
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> semantic of lookup more straightforward for developers.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> For replying to Alexander:
> >>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
> >>>>>>> is
> >>>>>>>>>>>>>>> deprecated
> >>>>>>>>>>>>>>>>> or not. Am I right that it will be so in the future, but
> >>>>>>>>> currently
> >>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>> not?
> >>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. InputFormat is not deprecated
> >>>>>> for
> >>>>>>>>> now.
> >>>>>>>>>>> I
> >>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>> it will be deprecated in the future but we don't have a
> >>>>>>> clear
> >>>>>>>>> plan
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> that.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks again for the discussion on this FLIP and
> >>>>>>>> looking
> >>>>>>>>>>>>>>> forward
> >>>>>>>>>>>>>>>>> to cooperating with you after we finalize the design and
> >>>>>>>>>>> interfaces!
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 13, 2022 at 12:12 AM Александр
> >>>>>> Смирнов <
> >>>>>>>>>>>>>>>>> smirale...@gmail.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark, Qingsheng and Leonard!
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see that we came to a consensus on almost
> >>>>>>> all
> >>>>>>>>>>>>> points!
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat
> >>>>>>> is
> >>>>>>>>>>>>>>> deprecated
> >>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>> not. Am I right that it will be so in the future,
> >>>>>>> but
> >>>>>>>>>>>>>>> currently
> >>>>>>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>>>>>>>>>>>> not? Actually I also think that for the first
> >>>>>>> version
> >>>>>>>>>>> it's
> >>>>>>>>>>>>> OK
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in ALL cache realization, because
> >>>>>>>>> supporting
> >>>>>>>>>>>>>>> rescan
> >>>>>>>>>>>>>>>>>>>>>>>>>> ability seems like a very distant prospect. But
> >>>>>> for
> >>>>>>>>> this
> >>>>>>>>>>>>>>>>> decision we
> >>>>>>>>>>>>>>>>>>>>>>>>>> need a consensus among all discussion
> >>>>>> participants.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> In general, I don't have something to argue with
> >>>>>>> your
> >>>>>>>>>>>>>>>>> statements. All
> >>>>>>>>>>>>>>>>>>>>>>>>>> of them correspond my ideas. Looking ahead, it
> >>>>>>> would
> >>>>>>>> be
> >>>>>>>>>>> nice
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP cooperatively. I've already done a
> >>>>>> lot
> >>>>>>>> of
> >>>>>>>>>>> work
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>> join caching with realization very close to the
> >>>>>> one
> >>>>>>>> we
> >>>>>>>>>>> are
> >>>>>>>>>>>>>>>>> discussing,
> >>>>>>>>>>>>>>>>>>>>>>>>>> and want to share the results of this work.
> >>>>>> Anyway
> >>>>>>>>>>> looking
> >>>>>>>>>>>>>>>>> forward for
> >>>>>>>>>>>>>>>>>>>>>>>>>> the FLIP update!
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 17:38, Jark Wu <
> >>>>>>>> imj...@gmail.com
> >>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for summarizing your points.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the past week, Qingsheng, Leonard, and I have
> >>>>>>>>>>> discussed
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> several times
> >>>>>>>>>>>>>>>>>>>>>>>>>>> and we have totally refactored the design.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm glad to say we have reached a consensus on
> >>>>>>> many
> >>>>>>>> of
> >>>>>>>>>>> your
> >>>>>>>>>>>>>>>>> points!
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng is still working on updating the
> >>>>>> design
> >>>>>>>> docs
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> maybe can be
> >>>>>>>>>>>>>>>>>>>>>>>>>>> available in the next few days.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I will share some conclusions from our
> >>>>>>> discussions:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1) we have refactored the design towards to
> >>>>>> "cache
> >>>>>>>> in
> >>>>>>>>>>>>>>>>> framework" way.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2) a "LookupCache" interface for users to
> >>>>>>> customize
> >>>>>>>>> and
> >>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> default
> >>>>>>>>>>>>>>>>>>>>>>>>>>> implementation with builder for users to
> >>>>>> easy-use.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> This can both make it possible to both have
> >>>>>>>>> flexibility
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> conciseness.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Filter pushdown is important for ALL and LRU
> >>>>>>>> lookup
> >>>>>>>>>>>>>>> cache,
> >>>>>>>>>>>>>>>>> esp reducing
> >>>>>>>>>>>>>>>>>>>>>>>>>>> IO.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Filter pushdown should be the final state and
> >>>>>> the
> >>>>>>>>>>> unified
> >>>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>> to both
> >>>>>>>>>>>>>>>>>>>>>>>>>>> support pruning ALL cache and LRU cache,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> so I think we should make effort in this
> >>>>>>> direction.
> >>>>>>>> If
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>> to support
> >>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown for ALL cache anyway, why not
> >>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>>> it for LRU cache as well? Either way, as we
> >>>>>> decide
> >>>>>>>> to
> >>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>> the cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>> in the framework, we have the chance to support
> >>>>>>>>>>>>>>>>>>>>>>>>>>> filter on cache anytime. This is an optimization
> >>>>>>> and
> >>>>>>>>> it
> >>>>>>>>>>>>>>>> doesn't
> >>>>>>>>>>>>>>>>> affect the
> >>>>>>>>>>>>>>>>>>>>>>>>>>> public API. I think we can create a JIRA issue
> >>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>> discuss it when the FLIP is accepted.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 4) The idea to support ALL cache is similar to
> >>>>>>> your
> >>>>>>>>>>>>>>> proposal.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the first version, we will only support
> >>>>>>>>> InputFormat,
> >>>>>>>>>>>>>>>>> SourceFunction for
> >>>>>>>>>>>>>>>>>>>>>>>>>>> cache all (invoke InputFormat in join operator).
> >>>>>>>>>>>>>>>>>>>>>>>>>>> For FLIP-27 source, we need to join a true
> >>>>>> source
> >>>>>>>>>>> operator
> >>>>>>>>>>>>>>>>> instead of
> >>>>>>>>>>>>>>>>>>>>>>>>>>> calling it embedded in the join operator.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> However, this needs another FLIP to support the
> >>>>>>>>> re-scan
> >>>>>>>>>>>>>>>> ability
> >>>>>>>>>>>>>>>>> for FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Source, and this can be a large work.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> In order to not block this issue, we can put the
> >>>>>>>>> effort
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> FLIP-27 source
> >>>>>>>>>>>>>>>>>>>>>>>>>>> integration into future work and integrate
> >>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat&SourceFunction for now.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's fine to use
> >>>>>>> InputFormat&SourceFunction,
> >>>>>>>>> as
> >>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>> are not
> >>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated, otherwise, we have to introduce
> >>>>>>> another
> >>>>>>>>>>>>> function
> >>>>>>>>>>>>>>>>>>>>>>>>>>> similar to them which is meaningless. We need to
> >>>>>>>> plan
> >>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>> source
> >>>>>>>>>>>>>>>>>>>>>>>>>>> integration ASAP before InputFormat &
> >>>>>>> SourceFunction
> >>>>>>>>> are
> >>>>>>>>>>>>>>>>> deprecated.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 15:46, Александр Смирнов
> >>>>>> <
> >>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Martijn!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Got it. Therefore, the realization with
> >>>>>>> InputFormat
> >>>>>>>>> is
> >>>>>>>>>>> not
> >>>>>>>>>>>>>>>>> considered.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 14:23, Martijn Visser <
> >>>>>>>>>>>>>>>>> mart...@ververica.com>:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With regards to:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But if there are plans to refactor all
> >>>>>>> connectors
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, FLIP-27 is the target for all connectors.
> >>>>>>> The
> >>>>>>>>> old
> >>>>>>>>>>>>>>>>> interfaces will be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated and connectors will either be
> >>>>>>>> refactored
> >>>>>>>>> to
> >>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>> the new ones
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> dropped.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> The caching should work for connectors that
> >>>>>> are
> >>>>>>>>> using
> >>>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>> interfaces,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we should not introduce new features for old
> >>>>>>>>>>> interfaces.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 06:19, Александр
> >>>>>> Смирнов
> >>>>>>> <
> >>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the late response. I would like to
> >>>>>>> make
> >>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> comments and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clarify my points.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I agree with your first statement. I think
> >>>>>>> we
> >>>>>>>>> can
> >>>>>>>>>>>>>>>> achieve
> >>>>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages this way: put the Cache interface
> >>>>>> in
> >>>>>>>>>>>>>>>>> flink-table-common,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but have implementations of it in
> >>>>>>>>>>> flink-table-runtime.
> >>>>>>>>>>>>>>>>> Therefore if a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector developer wants to use existing
> >>>>>> cache
> >>>>>>>>>>>>>>> strategies
> >>>>>>>>>>>>>>>>> and their
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations, he can just pass
> >>>>>> lookupConfig
> >>>>>>> to
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> planner, but if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he wants to have its own cache implementation
> >>>>>>> in
> >>>>>>>>> his
> >>>>>>>>>>>>>>>>> TableFunction, it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be possible for him to use the existing
> >>>>>>>>>>> interface
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> purpose (we can explicitly point this out in
> >>>>>>> the
> >>>>>>>>>>>>>>>>> documentation). In
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this way all configs and metrics will be
> >>>>>>> unified.
> >>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
> >>>>>>> cache,
> >>>>>>>> we
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> have 90% of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup requests that can never be cached
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Let me clarify the logic filters
> >>>>>>> optimization
> >>>>>>>> in
> >>>>>>>>>>> case
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> LRU cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like Cache<RowData,
> >>>>>>>> Collection<RowData>>.
> >>>>>>>>>>> Here
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> always
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store the response of the dimension table in
> >>>>>>>> cache,
> >>>>>>>>>>> even
> >>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> applying calc function. I.e. if there are no
> >>>>>>> rows
> >>>>>>>>>>> after
> >>>>>>>>>>>>>>>>> applying
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters to the result of the 'eval' method of
> >>>>>>>>>>>>>>>> TableFunction,
> >>>>>>>>>>>>>>>>> we store
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the empty list by lookup keys. Therefore the
> >>>>>>>> cache
> >>>>>>>>>>> line
> >>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filled, but will require much less memory (in
> >>>>>>>>> bytes).
> >>>>>>>>>>>>>>> I.e.
> >>>>>>>>>>>>>>>>> we don't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completely filter keys, by which result was
> >>>>>>>> pruned,
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> significantly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce required memory to store this result.
> >>>>>> If
> >>>>>>>> the
> >>>>>>>>>>> user
> >>>>>>>>>>>>>>>>> knows about
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this behavior, he can increase the 'max-rows'
> >>>>>>>>> option
> >>>>>>>>>>>>>>> before
> >>>>>>>>>>>>>>>>> the start
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. But actually I came up with the
> >>>>>>> idea
> >>>>>>>>>>> that we
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> do this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> automatically by using the 'maximumWeight'
> >>>>>> and
> >>>>>>>>>>> 'weigher'
> >>>>>>>>>>>>>>>>> methods of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GuavaCache [1]. Weight can be the size of the
> >>>>>>>>>>> collection
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> rows
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (value of cache). Therefore cache can
> >>>>>>>> automatically
> >>>>>>>>>>> fit
> >>>>>>>>>>>>>>>> much
> >>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records than before.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL has provided a standard way to do
> >>>>>>>>> filters
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> projects
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
> >>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
> >>>>>>>>> interfaces,
> >>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>> mean it's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to implement.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's debatable how difficult it will be to
> >>>>>>>>> implement
> >>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>> pushdown.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I think the fact that currently there is
> >>>>>> no
> >>>>>>>>>>> database
> >>>>>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with filter pushdown at least means that this
> >>>>>>>>> feature
> >>>>>>>>>>>>>>> won't
> >>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supported soon in connectors. Moreover, if we
> >>>>>>>> talk
> >>>>>>>>>>> about
> >>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors (not in Flink repo), their
> >>>>>> databases
> >>>>>>>>> might
> >>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>> support all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink filters (or not support filters at
> >>>>>> all).
> >>>>>>> I
> >>>>>>>>>>> think
> >>>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in supporting cache filters
> >>>>>>>> optimization
> >>>>>>>>>>>>>>>>> independently of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supporting other features and solving more
> >>>>>>>> complex
> >>>>>>>>>>>>>>> problems
> >>>>>>>>>>>>>>>>> (or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unsolvable at all).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) I agree with your third statement.
> >>>>>> Actually
> >>>>>>> in
> >>>>>>>>> our
> >>>>>>>>>>>>>>>>> internal version
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also tried to unify the logic of scanning
> >>>>>> and
> >>>>>>>>>>>>> reloading
> >>>>>>>>>>>>>>>>> data from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. But unfortunately, I didn't find
> >>>>>> a
> >>>>>>>> way
> >>>>>>>>> to
> >>>>>>>>>>>>>>> unify
> >>>>>>>>>>>>>>>>> the logic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of all ScanRuntimeProviders (InputFormat,
> >>>>>>>>>>>>> SourceFunction,
> >>>>>>>>>>>>>>>>> Source,...)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and reuse it in reloading ALL cache. As a
> >>>>>>> result
> >>>>>>>> I
> >>>>>>>>>>>>>>> settled
> >>>>>>>>>>>>>>>>> on using
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat, because it was used for scanning
> >>>>>>> in
> >>>>>>>>> all
> >>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. (I didn't know that there are
> >>>>>> plans
> >>>>>>>> to
> >>>>>>>>>>>>>>>> deprecate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in favor of FLIP-27 Source). IMO
> >>>>>>>> usage
> >>>>>>>>> of
> >>>>>>>>>>>>>>>>> FLIP-27 source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in ALL caching is not good idea, because this
> >>>>>>>>> source
> >>>>>>>>>>> was
> >>>>>>>>>>>>>>>>> designed to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work in distributed environment
> >>>>>>> (SplitEnumerator
> >>>>>>>> on
> >>>>>>>>>>>>>>>>> JobManager and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SourceReaders on TaskManagers), not in one
> >>>>>>>> operator
> >>>>>>>>>>>>>>> (lookup
> >>>>>>>>>>>>>>>>> join
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator in our case). There is even no
> >>>>>> direct
> >>>>>>>> way
> >>>>>>>>> to
> >>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>> splits from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumerator to SourceReader (this logic
> >>>>>>> works
> >>>>>>>>>>>>> through
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumeratorContext, which requires
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OperatorCoordinator.SubtaskGateway to send
> >>>>>>>>>>>>>>> AddSplitEvents).
> >>>>>>>>>>>>>>>>> Usage of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat for ALL cache seems much more
> >>>>>>> clearer
> >>>>>>>>> and
> >>>>>>>>>>>>>>>>> easier. But if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are plans to refactor all connectors to
> >>>>>>>>>>> FLIP-27, I
> >>>>>>>>>>>>>>>>> have the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following ideas: maybe we can refuse from
> >>>>>>> lookup
> >>>>>>>>> join
> >>>>>>>>>>>>> ALL
> >>>>>>>>>>>>>>>>> cache in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> favor of simple join with multiple scanning
> >>>>>> of
> >>>>>>>>> batch
> >>>>>>>>>>>>>>>> source?
> >>>>>>>>>>>>>>>>> The point
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is that the only difference between lookup
> >>>>>> join
> >>>>>>>> ALL
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>> and simple
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join with batch source is that in the first
> >>>>>>> case
> >>>>>>>>>>>>> scanning
> >>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> performed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple times, in between which state
> >>>>>> (cache)
> >>>>>>> is
> >>>>>>>>>>>>> cleared
> >>>>>>>>>>>>>>>>> (correct me
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if I'm wrong). So what if we extend the
> >>>>>>>>>>> functionality of
> >>>>>>>>>>>>>>>>> simple join
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to support state reloading + extend the
> >>>>>>>>>>> functionality of
> >>>>>>>>>>>>>>>>> scanning
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch source multiple times (this one should
> >>>>>> be
> >>>>>>>>> easy
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>> new FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source, that unifies streaming/batch reading
> >>>>>> -
> >>>>>>> we
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>> to change
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only SplitEnumerator, which will pass splits
> >>>>>>>> again
> >>>>>>>>>>> after
> >>>>>>>>>>>>>>>>> some TTL).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> WDYT? I must say that this looks like a
> >>>>>>> long-term
> >>>>>>>>>>> goal
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> will make
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the scope of this FLIP even larger than you
> >>>>>>> said.
> >>>>>>>>>>> Maybe
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> can limit
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ourselves to a simpler solution now
> >>>>>>>> (InputFormats).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So to sum up, my points is like this:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) There is a way to make both concise and
> >>>>>>>> flexible
> >>>>>>>>>>>>>>>>> interfaces for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching in lookup join.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Cache filters optimization is important
> >>>>>> both
> >>>>>>>> in
> >>>>>>>>>>> LRU
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> ALL caches.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) It is unclear when filter pushdown will be
> >>>>>>>>>>> supported
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors, some of the connectors might not
> >>>>>>> have
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> opportunity to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support filter pushdown + as I know,
> >>>>>> currently
> >>>>>>>>> filter
> >>>>>>>>>>>>>>>>> pushdown works
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only for scanning (not lookup). So cache
> >>>>>>> filters
> >>>>>>>> +
> >>>>>>>>>>>>>>>>> projections
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization should be independent from other
> >>>>>>>>>>> features.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) ALL cache realization is a complex topic
> >>>>>>> that
> >>>>>>>>>>>>> involves
> >>>>>>>>>>>>>>>>> multiple
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aspects of how Flink is developing. Refusing
> >>>>>>> from
> >>>>>>>>>>>>>>>>> InputFormat in favor
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of FLIP-27 Source will make ALL cache
> >>>>>>> realization
> >>>>>>>>>>> really
> >>>>>>>>>>>>>>>>> complex and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not clear, so maybe instead of that we can
> >>>>>>> extend
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> functionality of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simple join or not refuse from InputFormat in
> >>>>>>>> case
> >>>>>>>>> of
> >>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>> join ALL
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>> https://guava.dev/releases/18.0/api/docs/com/google/common/cache/CacheBuilder.html#weigher(com.google.common.cache.Weigher)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 5 мая 2022 г. в 20:34, Jark Wu <
> >>>>>>>>> imj...@gmail.com
> >>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's great to see the active discussion! I
> >>>>>>> want
> >>>>>>>> to
> >>>>>>>>>>>>> share
> >>>>>>>>>>>>>>>> my
> >>>>>>>>>>>>>>>>> ideas:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) implement the cache in framework vs.
> >>>>>>>> connectors
> >>>>>>>>>>> base
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have a strong opinion on this. Both
> >>>>>>> ways
> >>>>>>>>>>> should
> >>>>>>>>>>>>>>>>> work (e.g.,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pruning, compatibility).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The framework way can provide more concise
> >>>>>>>>>>> interfaces.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The connector base way can define more
> >>>>>>> flexible
> >>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies/implementations.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating a way to see if
> >>>>>> we
> >>>>>>>> can
> >>>>>>>>>>> have
> >>>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should reach a consensus that the way
> >>>>>>> should
> >>>>>>>>> be a
> >>>>>>>>>>>>>>> final
> >>>>>>>>>>>>>>>>> state,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are on the path to it.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) filters and projections pushdown:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with Alex that the filter pushdown
> >>>>>>> into
> >>>>>>>>>>> cache
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> benefit a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> lot
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ALL cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this is not true for LRU cache.
> >>>>>>>>> Connectors
> >>>>>>>>>>> use
> >>>>>>>>>>>>>>>>> cache to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests to databases for better throughput.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the
> >>>>>>> cache,
> >>>>>>>> we
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> have 90% of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests that can never be cached
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and hit directly to the databases. That
> >>>>>> means
> >>>>>>>> the
> >>>>>>>>>>> cache
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> meaningless in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IMO, Flink SQL has provided a standard way
> >>>>>> to
> >>>>>>> do
> >>>>>>>>>>>>> filters
> >>>>>>>>>>>>>>>>> and projects
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the
> >>>>>>>>> interfaces,
> >>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>> mean it's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> hard
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implement.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They should implement the pushdown
> >>>>>> interfaces
> >>>>>>> to
> >>>>>>>>>>> reduce
> >>>>>>>>>>>>>>> IO
> >>>>>>>>>>>>>>>>> and the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That should be a final state that the scan
> >>>>>>>> source
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> lookup source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> share
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the exact pushdown implementation.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see why we need to duplicate the
> >>>>>>>> pushdown
> >>>>>>>>>>> logic
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> caches,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will complex the lookup join design.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) ALL cache abstraction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All cache might be the most challenging part
> >>>>>>> of
> >>>>>>>>> this
> >>>>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>> We have
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> never
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provided a reload-lookup public interface.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, we put the reload logic in the
> >>>>>>> "eval"
> >>>>>>>>>>> method
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's hard for some sources (e.g., Hive).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally, connector implementation should
> >>>>>> share
> >>>>>>>> the
> >>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> reload
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scan, i.e. ScanTableSource with
> >>>>>>>>>>>>>>>>> InputFormat/SourceFunction/FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, InputFormat/SourceFunction are
> >>>>>>>>> deprecated,
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> FLIP-27
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is deeply coupled with SourceOperator.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we want to invoke the FLIP-27 source in
> >>>>>>>>>>> LookupJoin,
> >>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> may make
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope of this FLIP much larger.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating how to abstract
> >>>>>> the
> >>>>>>>> ALL
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>> logic and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> reuse
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the existing source interfaces.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 20:22, Roman Boyko <
> >>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's a much more complicated activity and
> >>>>>>> lies
> >>>>>>>>> out
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> scope of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement. Because such pushdowns should
> >>>>>> be
> >>>>>>>>> done
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ScanTableSource
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations (not only for Lookup ones).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 19:02, Martijn
> >>>>>> Visser <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One question regarding "And Alexander
> >>>>>>>> correctly
> >>>>>>>>>>>>>>>> mentioned
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown still is not implemented for
> >>>>>>>>>>>>>>> jdbc/hive/hbase."
> >>>>>>>>>>>>>>>>> -> Would
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative solution be to actually
> >>>>>>> implement
> >>>>>>>>>>> these
> >>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdowns?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imagine that there are many more benefits
> >>>>>> to
> >>>>>>>>> doing
> >>>>>>>>>>>>>>> that,
> >>>>>>>>>>>>>>>>> outside
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching and metrics.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn Visser
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://twitter.com/MartijnVisser82
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/MartijnVisser
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 13:58, Roman Boyko <
> >>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving such a valuable
> >>>>>>>> improvement!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do think that single cache
> >>>>>> implementation
> >>>>>>>>>>> would be
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> nice
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> opportunity
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users. And it will break the "FOR
> >>>>>>> SYSTEM_TIME
> >>>>>>>>> AS
> >>>>>>>>>>> OF
> >>>>>>>>>>>>>>>>> proc_time"
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anyway - doesn't matter how it will be
> >>>>>>>>>>> implemented.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Putting myself in the user's shoes, I can
> >>>>>>> say
> >>>>>>>>>>> that:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I would prefer to have the opportunity
> >>>>>>> to
> >>>>>>>>> cut
> >>>>>>>>>>> off
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> size
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply filtering unnecessary data. And
> >>>>>> the
> >>>>>>>> most
> >>>>>>>>>>>>> handy
> >>>>>>>>>>>>>>>>> way to do
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apply
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it inside LookupRunners. It would be a
> >>>>>> bit
> >>>>>>>>>>> harder to
> >>>>>>>>>>>>>>>>> pass it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> through the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoin node to TableFunction. And
> >>>>>>>> Alexander
> >>>>>>>>>>>>>>>> correctly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown still is not implemented
> >>>>>>> for
> >>>>>>>>>>>>>>>>> jdbc/hive/hbase.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) The ability to set the different
> >>>>>> caching
> >>>>>>>>>>>>>>> parameters
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is quite important. So I would prefer to
> >>>>>>> set
> >>>>>>>> it
> >>>>>>>>>>>>>>> through
> >>>>>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> rather
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have similar ttla, strategy and other
> >>>>>>> options
> >>>>>>>>> for
> >>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> tables.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Providing the cache into the framework
> >>>>>>>>> really
> >>>>>>>>>>>>>>>>> deprives us of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extensibility (users won't be able to
> >>>>>>>> implement
> >>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably it might be solved by creating
> >>>>>>> more
> >>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a wider set of configurations.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All these points are much closer to the
> >>>>>>>> schema
> >>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alexander.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingshen Ren, please correct me if I'm
> >>>>>> not
> >>>>>>>>> right
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> facilities
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> might be simply implemented in your
> >>>>>>>>> architecture?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 4 May 2022 at 21:01, Martijn
> >>>>>>> Visser <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have much to chip in, but just
> >>>>>>>> wanted
> >>>>>>>>> to
> >>>>>>>>>>>>>>>>> express that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate the in-depth discussion on
> >>>>>> this
> >>>>>>>>> topic
> >>>>>>>>>>>>>>> and I
> >>>>>>>>>>>>>>>>> hope
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will join the conversation.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 May 2022 at 10:15, Александр
> >>>>>>>>> Смирнов <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, Leonard and Jark,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your detailed feedback!
> >>>>>>>> However, I
> >>>>>>>>>>> have
> >>>>>>>>>>>>>>>>> questions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some of your statements (maybe I didn't
> >>>>>>> get
> >>>>>>>>>>>>>>>>> something?).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Caching actually breaks the semantic
> >>>>>> of
> >>>>>>>> "FOR
> >>>>>>>>>>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree that the semantics of "FOR
> >>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>> AS
> >>>>>>>>>>>>>>> OF
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time"
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fully implemented with caching, but as
> >>>>>>> you
> >>>>>>>>>>> said,
> >>>>>>>>>>>>>>>> users
> >>>>>>>>>>>>>>>>> go
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> on it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consciously to achieve better
> >>>>>> performance
> >>>>>>>> (no
> >>>>>>>>>>> one
> >>>>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enable
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching by default, etc.). Or by users
> >>>>>> do
> >>>>>>>> you
> >>>>>>>>>>> mean
> >>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors? In this case developers
> >>>>>>>>> explicitly
> >>>>>>>>>>>>>>>> specify
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> whether
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector supports caching or not (in
> >>>>>> the
> >>>>>>>>> list
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> supported
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options),
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no one makes them do that if they don't
> >>>>>>>> want
> >>>>>>>>>>> to.
> >>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference between implementing
> >>>>>>> caching
> >>>>>>>>> in
> >>>>>>>>>>>>>>>> modules
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime and in
> >>>>>>>> flink-table-common
> >>>>>>>>>>> from
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point of view? How does it affect on
> >>>>>>>>>>>>>>>>> breaking/non-breaking
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics of "FOR SYSTEM_TIME AS OF
> >>>>>>>>> proc_time"?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a situation that allows table
> >>>>>>>>>>> options in
> >>>>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior of the framework, which has
> >>>>>>> never
> >>>>>>>>>>>>> happened
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> previously
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be cautious
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we talk about main differences of
> >>>>>>>>> semantics
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>> DDL
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> options
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> config options("table.exec.xxx"), isn't
> >>>>>>> it
> >>>>>>>>>>> about
> >>>>>>>>>>>>>>>>> limiting
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the options + importance for the user
> >>>>>>>>> business
> >>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>> rather
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific location of corresponding
> >>>>>> logic
> >>>>>>> in
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> framework? I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mean
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in my design, for example, putting an
> >>>>>>>> option
> >>>>>>>>>>> with
> >>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategy in configurations would  be
> >>>>>> the
> >>>>>>>>> wrong
> >>>>>>>>>>>>>>>>> decision,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly affects the user's business
> >>>>>>> logic
> >>>>>>>>> (not
> >>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization) + touches just several
> >>>>>>>>> functions
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>> ONE
> >>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (there
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be multiple tables with different
> >>>>>>> caches).
> >>>>>>>>>>> Does it
> >>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matter for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user (or someone else) where the
> >>>>>>> logic
> >>>>>>>> is
> >>>>>>>>>>>>>>>> located,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> affected by the applied option?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also I can remember DDL option
> >>>>>>>>>>> 'sink.parallelism',
> >>>>>>>>>>>>>>>>> which in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some way
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "controls the behavior of the
> >>>>>> framework"
> >>>>>>>> and
> >>>>>>>>> I
> >>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>> see any
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a new interface for this
> >>>>>>>>> all-caching
> >>>>>>>>>>>>>>>>> scenario
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would become more complex
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a subject for a separate
> >>>>>>>> discussion,
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> in our
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal version we solved this problem
> >>>>>>>> quite
> >>>>>>>>>>>>>>> easily
> >>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> reused
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat class (so there is no need
> >>>>>>> for
> >>>>>>>> a
> >>>>>>>>>>> new
> >>>>>>>>>>>>>>>> API).
> >>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that currently all lookup connectors
> >>>>>> use
> >>>>>>>>>>>>>>> InputFormat
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scanning
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in batch mode: HBase, JDBC and
> >>>>>> even
> >>>>>>>> Hive
> >>>>>>>>>>> - it
> >>>>>>>>>>>>>>>> uses
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PartitionReader, that is actually just
> >>>>>> a
> >>>>>>>>>>> wrapper
> >>>>>>>>>>>>>>>> around
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The advantage of this solution is the
> >>>>>>>> ability
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>> reload
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (number of threads depends on
> >>>>>>>> number
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> InputSplits,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an upper limit). As a result cache
> >>>>>> reload
> >>>>>>>>> time
> >>>>>>>>>>>>>>>>> significantly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduces
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as well as time of input stream
> >>>>>>>> blocking). I
> >>>>>>>>>>> know
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> usually
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to avoid usage of concurrency in Flink
> >>>>>>>> code,
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an exception. BTW I don't say that it's
> >>>>>>> an
> >>>>>>>>>>> ideal
> >>>>>>>>>>>>>>>>> solution,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better ones.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Providing the cache in the framework
> >>>>>>> might
> >>>>>>>>>>>>>>> introduce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's possible only in cases when the
> >>>>>>>>> developer
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> won't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properly refactor his code and will use
> >>>>>>> new
> >>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>> options
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrectly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. explicitly provide the same
> >>>>>> options
> >>>>>>>>> into
> >>>>>>>>>>> 2
> >>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> places). For correct behavior all he
> >>>>>> will
> >>>>>>>>> need
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>> is to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing options to the framework's
> >>>>>>>>>>> LookupConfig
> >>>>>>>>>>>>> (+
> >>>>>>>>>>>>>>>>> maybe
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> add an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alias
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for options, if there was different
> >>>>>>>> naming),
> >>>>>>>>>>>>>>>> everything
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> will be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transparent for users. If the developer
> >>>>>>>> won't
> >>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> refactoring at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nothing will be changed for the
> >>>>>> connector
> >>>>>>>>>>> because
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> backward
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility. Also if a developer
> >>>>>> wants
> >>>>>>> to
> >>>>>>>>> use
> >>>>>>>>>>>>> his
> >>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he just can refuse to pass some of the
> >>>>>>>>> configs
> >>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead make his own implementation
> >>>>>> with
> >>>>>>>>>>> already
> >>>>>>>>>>>>>>>>> existing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics (but actually I think that
> >>>>>> it's a
> >>>>>>>>> rare
> >>>>>>>>>>>>>>> case).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections should be
> >>>>>> pushed
> >>>>>>>> all
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>> down
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, like what we do in the scan
> >>>>>>>> source
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's the great purpose. But the truth
> >>>>>> is
> >>>>>>>> that
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>> ONLY
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> connector
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports filter pushdown is
> >>>>>>>>>>> FileSystemTableSource
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no database connector supports it
> >>>>>>>>> currently).
> >>>>>>>>>>>>> Also
> >>>>>>>>>>>>>>>>> for some
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> databases
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's simply impossible to pushdown such
> >>>>>>>>> complex
> >>>>>>>>>>>>>>>> filters
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> that we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in Flink.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only applying these optimizations to
> >>>>>> the
> >>>>>>>>> cache
> >>>>>>>>>>>>>>> seems
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filters can cut off an arbitrarily
> >>>>>> large
> >>>>>>>>>>> amount of
> >>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> from the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dimension table. For a simple example,
> >>>>>>>>> suppose
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> dimension
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'users'
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have column 'age' with values from
> >>>>>> 20
> >>>>>>> to
> >>>>>>>>> 40,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> input
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> stream
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'clicks' that is ~uniformly distributed
> >>>>>>> by
> >>>>>>>>> age
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> users. If
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter 'age > 30',
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there will be twice less data in cache.
> >>>>>>>> This
> >>>>>>>>>>> means
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase 'lookup.cache.max-rows' by
> >>>>>>> almost
> >>>>>>>> 2
> >>>>>>>>>>>>> times.
> >>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gain a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huge
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance boost. Moreover, this
> >>>>>>>>> optimization
> >>>>>>>>>>>>>>> starts
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shine
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in 'ALL' cache, where tables without
> >>>>>>>> filters
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> projections
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fit
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in memory, but with them - can. This
> >>>>>>> opens
> >>>>>>>> up
> >>>>>>>>>>>>>>>>> additional
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possibilities
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for users. And this doesn't sound as
> >>>>>> 'not
> >>>>>>>>> quite
> >>>>>>>>>>>>>>>>> useful'.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great to hear other voices
> >>>>>>>>>>> regarding
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> topic!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have quite a lot of controversial
> >>>>>>>> points,
> >>>>>>>>>>> and I
> >>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of others it will be easier for us to
> >>>>>>> come
> >>>>>>>>> to a
> >>>>>>>>>>>>>>>>> consensus.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пт, 29 апр. 2022 г. в 22:33, Qingsheng
> >>>>>>> Ren
> >>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> renqs...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and Arvid,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the discussion and sorry
> >>>>>> for
> >>>>>>> my
> >>>>>>>>>>> late
> >>>>>>>>>>>>>>>>> response!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal discussion together with Jark
> >>>>>>> and
> >>>>>>>>>>> Leonard
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> I’d
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> summarize our ideas. Instead of
> >>>>>>>> implementing
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> logic in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime layer or wrapping around the
> >>>>>>>>>>> user-provided
> >>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to introduce some new APIs
> >>>>>>> extending
> >>>>>>>>>>>>>>>>> TableFunction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concerns:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Caching actually breaks the
> >>>>>> semantic
> >>>>>>> of
> >>>>>>>>>>> "FOR
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”, because it couldn’t truly
> >>>>>>>> reflect
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> content
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table at the moment of querying. If
> >>>>>> users
> >>>>>>>>>>> choose
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> enable
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup table, they implicitly indicate
> >>>>>>> that
> >>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> breakage is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> acceptable
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exchange for the performance. So we
> >>>>>>> prefer
> >>>>>>>>> not
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table runtime level.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. If we make the cache implementation
> >>>>>>> in
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> framework
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (whether
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runner or a wrapper around
> >>>>>>> TableFunction),
> >>>>>>>> we
> >>>>>>>>>>> have
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> situation
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that allows table options in DDL to
> >>>>>>> control
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> behavior of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which has never happened previously and
> >>>>>>>>> should
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> cautious.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Under
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current design the behavior of the
> >>>>>>>> framework
> >>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>> only be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations (“table.exec.xxx”), and
> >>>>>>> it’s
> >>>>>>>>>>> hard
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> apply
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> general
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs to a specific table.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We have use cases that lookup
> >>>>>> source
> >>>>>>>>> loads
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> refresh
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> periodically into the memory to achieve
> >>>>>>>> high
> >>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> performance
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector in the community, and also
> >>>>>>> widely
> >>>>>>>>>>> used
> >>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors). Wrapping the cache around
> >>>>>>> the
> >>>>>>>>>>> user’s
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for LRU caches, but I think we have to
> >>>>>>>>>>> introduce a
> >>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all-caching scenario and the design
> >>>>>> would
> >>>>>>>>>>> become
> >>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> complex.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Providing the cache in the
> >>>>>> framework
> >>>>>>>>> might
> >>>>>>>>>>>>>>>>> introduce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to existing lookup sources like
> >>>>>>>> there
> >>>>>>>>>>> might
> >>>>>>>>>>>>>>>>> exist two
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caches
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> totally different strategies if the
> >>>>>> user
> >>>>>>>>>>>>>>> incorrectly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> configures
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (one in the framework and another
> >>>>>>>> implemented
> >>>>>>>>>>> by
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As for the optimization mentioned by
> >>>>>>>>>>> Alexander, I
> >>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projections should be pushed all the
> >>>>>> way
> >>>>>>>> down
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what we do in the scan source, instead
> >>>>>> of
> >>>>>>>> the
> >>>>>>>>>>>>>>> runner
> >>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal of using cache is to reduce the
> >>>>>>>> network
> >>>>>>>>>>> I/O
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> pressure
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external system, and only applying
> >>>>>> these
> >>>>>>>>>>>>>>>> optimizations
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not quite useful.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made some updates to the FLIP[1] to
> >>>>>>>>> reflect
> >>>>>>>>>>> our
> >>>>>>>>>>>>>>>>> ideas.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keep the cache implementation as a part
> >>>>>>> of
> >>>>>>>>>>>>>>>>> TableFunction,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide some helper classes
> >>>>>>>>>>> (CachingTableFunction,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AllCachingTableFunction,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CachingAsyncTableFunction) to
> >>>>>> developers
> >>>>>>>> and
> >>>>>>>>>>>>>>> regulate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, I made a POC[2] for your
> >>>>>> reference.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your ideas!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>> https://github.com/PatrickRen/flink/tree/FLIP-221
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 26, 2022 at 4:45 PM
> >>>>>>> Александр
> >>>>>>>>>>> Смирнов
> >>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Arvid!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have few comments on your message.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but could also live with an easier
> >>>>>>>>> solution
> >>>>>>>>>>> as
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> step:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that these 2 ways are
> >>>>>> mutually
> >>>>>>>>>>> exclusive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> (originally
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by Qingsheng and mine), because
> >>>>>>>>> conceptually
> >>>>>>>>>>>>> they
> >>>>>>>>>>>>>>>>> follow
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal, but implementation details are
> >>>>>>>>>>> different.
> >>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> go one
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> moving to another way in the future
> >>>>>>> will
> >>>>>>>>> mean
> >>>>>>>>>>>>>>>>> deleting
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and once again changing the API for
> >>>>>>>>>>> connectors.
> >>>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> think we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reach a consensus with the community
> >>>>>>>> about
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> then
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> together
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP, i.e. divide the work on
> >>>>>>>> tasks
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parts
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flip (for example, LRU cache
> >>>>>>> unification
> >>>>>>>> /
> >>>>>>>>>>>>>>>>> introducing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics / further work…). WDYT,
> >>>>>>>> Qingsheng?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as the source will only receive the
> >>>>>>>>> requests
> >>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually if filters are applied to
> >>>>>>> fields
> >>>>>>>>> of
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table, we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly must do requests, and only
> >>>>>>> after
> >>>>>>>>>>> that we
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> filter
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responses,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because lookup connectors don't have
> >>>>>>>> filter
> >>>>>>>>>>>>>>>>> pushdown. So
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filtering
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is done before caching, there will be
> >>>>>>>> much
> >>>>>>>>>>> less
> >>>>>>>>>>>>>>>> rows
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
> >>>>>>>>> architecture
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
> >>>>>> honest.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for that, I’m a bit new to such
> >>>>>>>> kinds
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> conversations
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have no write access to the
> >>>>>>> confluence,
> >>>>>>>>> so
> >>>>>>>>>>> I
> >>>>>>>>>>>>>>>> made a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jira
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where described the proposed changes
> >>>>>> in
> >>>>>>>>> more
> >>>>>>>>>>>>>>>> details
> >>>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-27411.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Will happy to get more feedback!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пн, 25 апр. 2022 г. в 19:49, Arvid
> >>>>>>> Heise
> >>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ar...@apache.org>:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this; the
> >>>>>>>> inconsistency
> >>>>>>>>>>> was
> >>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> satisfying
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Alexander's idea though but
> >>>>>>>> could
> >>>>>>>>>>> also
> >>>>>>>>>>>>>>>> live
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easier
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution as the first step: Instead
> >>>>>> of
> >>>>>>>>>>> making
> >>>>>>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> detail of TableFunction X, rather
> >>>>>>>> devise a
> >>>>>>>>>>>>>>> caching
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> layer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around X.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal would be a
> >>>>>>> CachingTableFunction
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> delegates to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> X in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> case
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misses and else manages the cache.
> >>>>>>>> Lifting
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> model
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed would be even better but is
> >>>>>>>>>>> probably
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first step
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for a lookup source (as the source
> >>>>>>> will
> >>>>>>>>> only
> >>>>>>>>>>>>>>>> receive
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter; applying projection may be
> >>>>>>> more
> >>>>>>>>>>>>>>>> interesting
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> save
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another advantage is that all the
> >>>>>>>> changes
> >>>>>>>>> of
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> FLIP
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> limited to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options, no need for new public
> >>>>>>>>> interfaces.
> >>>>>>>>>>>>>>>>> Everything
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> else
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remains
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation of Table runtime.
> >>>>>> That
> >>>>>>>>> means
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> easily
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorporate
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization potential that
> >>>>>> Alexander
> >>>>>>>>>>> pointed
> >>>>>>>>>>>>>>> out
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> later.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your
> >>>>>>>>> architecture
> >>>>>>>>>>> is
> >>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> shared.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be
> >>>>>> honest.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Apr 22, 2022 at 5:04 PM
> >>>>>>>> Александр
> >>>>>>>>>>>>>>> Смирнов
> >>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng! My name is Alexander,
> >>>>>>> I'm
> >>>>>>>>>>> not a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> committer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really like to become one. And this
> >>>>>>>> FLIP
> >>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually I have worked on a similar
> >>>>>>>>>>> feature in
> >>>>>>>>>>>>>>> my
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company’s
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fork, and we would like to share
> >>>>>> our
> >>>>>>>>>>> thoughts
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> this and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> open source.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there is a better
> >>>>>> alternative
> >>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing an
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abstract
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class for TableFunction
> >>>>>>>>>>>>> (CachingTableFunction).
> >>>>>>>>>>>>>>>> As
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction exists in the
> >>>>>>>>>>> flink-table-common
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provides
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only an API for working with
> >>>>>> tables –
> >>>>>>>>> it’s
> >>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> importing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in connectors. In turn,
> >>>>>>>>>>> CachingTableFunction
> >>>>>>>>>>>>>>>>> contains
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime execution,  so this class
> >>>>>> and
> >>>>>>>>>>>>>>> everything
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connected
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be located in another
> >>>>>> module,
> >>>>>>>>>>> probably
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this will require connectors to
> >>>>>>>>> depend
> >>>>>>>>>>> on
> >>>>>>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> contains a lot of runtime logic,
> >>>>>>> which
> >>>>>>>>>>> doesn’t
> >>>>>>>>>>>>>>>>> sound
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding a new method
> >>>>>>>>>>>>> ‘getLookupConfig’
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupTableSource
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or LookupRuntimeProvider to allow
> >>>>>>>>>>> connectors
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations to the planner,
> >>>>>>>> therefore
> >>>>>>>>>>> they
> >>>>>>>>>>>>>>>> won’t
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> depend on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> realization. Based on these configs
> >>>>>>>>> planner
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join operator with corresponding
> >>>>>>>> runtime
> >>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (ProcessFunctions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module flink-table-runtime).
> >>>>>>>> Architecture
> >>>>>>>>>>>>> looks
> >>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pinned
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image (LookupConfig class there is
> >>>>>>>>> actually
> >>>>>>>>>>>>>>> yours
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CacheConfig).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Classes in flink-table-planner,
> >>>>>> that
> >>>>>>>> will
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> responsible
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> –
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CommonPhysicalLookupJoin and his
> >>>>>>>>>>> inheritors.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Current classes for lookup join in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunner,
> >>>>>>>> AsyncLookupJoinRunner,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunnerWithCalc,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AsyncLookupJoinRunnerWithCalc.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding classes
> >>>>>>>>>>>>>>> LookupJoinCachingRunner,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinCachingRunnerWithCalc,
> >>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And here comes another more
> >>>>>> powerful
> >>>>>>>>>>> advantage
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> such a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have caching logic on a lower
> >>>>>>> level,
> >>>>>>>>> we
> >>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> apply
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimizations to it.
> >>>>>>>>>>> LookupJoinRunnerWithCalc
> >>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> named
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it uses the ‘calc’
> >>>>>> function,
> >>>>>>>>> which
> >>>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mostly
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consists
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, in join table A with
> >>>>>>>> lookup
> >>>>>>>>>>> table
> >>>>>>>>>>>>>>> B
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> condition
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘JOIN …
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ON
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A.id = B.id AND A.age = B.age + 10
> >>>>>>>> WHERE
> >>>>>>>>>>>>>>>> B.salary >
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000’
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘calc’
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function will contain filters
> >>>>>> A.age =
> >>>>>>>>>>> B.age +
> >>>>>>>>>>>>>>> 10
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> B.salary >
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we apply this function before
> >>>>>>>> storing
> >>>>>>>>>>>>>>> records
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache will be significantly
> >>>>>> reduced:
> >>>>>>>>>>> filters =
> >>>>>>>>>>>>>>>>> avoid
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useless
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records in cache, projections =
> >>>>>>> reduce
> >>>>>>>>>>>>> records’
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> size. So
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> max number of records in cache can
> >>>>>> be
> >>>>>>>>>>>>> increased
> >>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think about it?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2022/04/19 02:47:11 Qingsheng
> >>>>>> Ren
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yuan and I would like to start a
> >>>>>>>>>>> discussion
> >>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-221[1],
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduces an abstraction of lookup
> >>>>>>>> table
> >>>>>>>>>>>>> cache
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standard
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently each lookup table source
> >>>>>>>>> should
> >>>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store lookup results, and there
> >>>>>>> isn’t a
> >>>>>>>>>>>>>>> standard
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers to tuning their jobs
> >>>>>> with
> >>>>>>>>> lookup
> >>>>>>>>>>>>>>>> joins,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> common
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use case in Flink table / SQL.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore we propose some new APIs
> >>>>>>>>>>> including
> >>>>>>>>>>>>>>>>> cache,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrapper
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classes of TableFunction and new
> >>>>>>> table
> >>>>>>>>>>>>> options.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Please
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> take a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP page [1] to get more details.
> >>>>>>> Any
> >>>>>>>>>>>>>>>> suggestions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciated!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team
> >>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
>

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

Reply via email to