Hi devs, I’d like to push FLIP-221 forward a little bit. Recently we had some offline discussions and updated the FLIP. Here’s the diff compared to the previous version:
1. (Async)LookupFunctionProvider is designed as a base interface for constructing lookup functions. 2. From the LookupFunction we extend PartialCaching / FullCachingLookupProvider for partial and full caching mode. 3. Introduce CacheReloadTrigger for specifying reload stratrgy in full caching mode, and provide 2 default implementations (Periodic / TimedCacheReloadTrigger) Looking forward to your replies~ Best, Qingsheng > On Jun 2, 2022, at 17:15, Qingsheng Ren <renqs...@gmail.com> wrote: > > Hi Becket, > > Thanks for your feedback! > > 1. An alternative way is to let the implementation of cache to decide > whether to store a missing key in the cache instead of the framework. > This sounds more reasonable and makes the LookupProvider interface > cleaner. I can update the FLIP and clarify in the JavaDoc of > LookupCache#put that the cache should decide whether to store an empty > collection. > > 2. Initially the builder pattern is for the extensibility of > LookupProvider interfaces that we could need to add more > configurations in the future. We can remove the builder now as we have > resolved the issue in 1. As for the builder in DefaultLookupCache I > prefer to keep it because we have a lot of arguments in the > constructor. > > 3. I think this might overturn the overall design. I agree with > Becket's idea that the API design should be layered considering > extensibility and it'll be great to have one unified interface > supporting both partial, full and even mixed custom strategies, but we > have some issues to resolve. The original purpose of treating full > caching separately is that we'd like to reuse the ability of > ScanRuntimeProvider. Developers just need to hand over Source / > SourceFunction / InputFormat so that the framework could be able to > compose the underlying topology and control the reload (maybe in a > distributed way). Under your design we leave the reload operation > totally to the CacheStrategy and I think it will be hard for > developers to reuse the source in the initializeCache method. > > Best regards, > > Qingsheng > > On Thu, Jun 2, 2022 at 1:50 PM Becket Qin <becket....@gmail.com> wrote: >> >> Thanks for updating the FLIP, Qingsheng. A few more comments: >> >> 1. I am still not sure about what is the use case for cacheMissingKey(). >> More specifically, when would users want to have getCache() return a >> non-empty value and cacheMissingKey() returns false? >> >> 2. The builder pattern. Usually the builder pattern is used when there are >> a lot of variations of constructors. For example, if a class has three >> variables and all of them are optional, so there could potentially be many >> combinations of the variables. But in this FLIP, I don't see such case. >> What is the reason we have builders for all the classes? >> >> 3. Should the caching strategy be excluded from the top level provider API? >> Technically speaking, the Flink framework should only have two interfaces >> to deal with: >> A) LookupFunction >> B) AsyncLookupFunction >> Orthogonally, we *believe* there are two different strategies people can do >> caching. Note that the Flink framework does not care what is the caching >> strategy here. >> a) partial caching >> b) full caching >> >> Putting them together, we end up with 3 combinations that we think are >> valid: >> Aa) PartialCachingLookupFunctionProvider >> Ba) PartialCachingAsyncLookupFunctionProvider >> Ab) FullCachingLookupFunctionProvider >> >> However, the caching strategy could actually be quite flexible. E.g. an >> initial full cache load followed by some partial updates. Also, I am not >> 100% sure if the full caching will always use ScanTableSource. Including >> the caching strategy in the top level provider API would make it harder to >> extend. >> >> One possible solution is to just have *LookupFunctionProvider* and >> *AsyncLookupFunctionProvider >> *as the top level API, both with a getCacheStrategy() method returning an >> optional CacheStrategy. The CacheStrategy class would have the following >> methods: >> 1. void open(Context), the context exposes some of the resources that may >> be useful for the the caching strategy, e.g. an ExecutorService that is >> synchronized with the data processing, or a cache refresh trigger which >> blocks data processing and refresh the cache. >> 2. void initializeCache(), a blocking method allows users to pre-populate >> the cache before processing any data if they wish. >> 3. void maybeCache(RowData key, Collection<RowData> value), blocking or >> non-blocking method. >> 4. void refreshCache(), a blocking / non-blocking method that is invoked by >> the Flink framework when the cache refresh trigger is pulled. >> >> In the above design, partial caching and full caching would be >> implementations of the CachingStrategy. And it is OK for users to implement >> their own CachingStrategy if they want to. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> >> On Thu, Jun 2, 2022 at 12:14 PM Jark Wu <imj...@gmail.com> wrote: >> >>> Thank Qingsheng for the detailed summary and updates, >>> >>> The changes look good to me in general. I just have one minor improvement >>> comment. >>> Could we add a static util method to the "FullCachingReloadTrigger" >>> interface for quick usage? >>> >>> #periodicReloadAtFixedRate(Duration) >>> #periodicReloadWithFixedDelay(Duration) >>> >>> I think we can also do this for LookupCache. Because users may not know >>> where is the default >>> implementations and how to use them. >>> >>> Best, >>> Jark >>> >>> >>> >>> >>> >>> >>> On Wed, 1 Jun 2022 at 18:32, Qingsheng Ren <renqs...@gmail.com> wrote: >>> >>>> Hi Jingsong, >>>> >>>> Thanks for your comments! >>>> >>>>> AllCache definition is not flexible, for example, PartialCache can use >>>> any custom storage, while the AllCache can not, AllCache can also be >>>> considered to store memory or disk, also need a flexible strategy. >>>> >>>> We had an offline discussion with Jark and Leonard. Basically we think >>>> exposing the interface of full cache storage to connector developers >>> might >>>> limit our future optimizations. The storage of full caching shouldn’t >>> have >>>> too many variations for different lookup tables so making it pluggable >>>> might not help a lot. Also I think it is not quite easy for connector >>>> developers to implement such an optimized storage. We can keep optimizing >>>> this storage in the future and all full caching lookup tables would >>> benefit >>>> from this. >>>> >>>>> We are more inclined to deprecate the connector `async` option when >>>> discussing FLIP-234. Can we remove this option from this FLIP? >>>> >>>> Thanks for the reminder! This option has been removed in the latest >>>> version. >>>> >>>> Best regards, >>>> >>>> Qingsheng >>>> >>>> >>>>> On Jun 1, 2022, at 15:28, Jingsong Li <jingsongl...@gmail.com> wrote: >>>>> >>>>> Thanks Alexander for your reply. We can discuss the new interface when >>> it >>>>> comes out. >>>>> >>>>> We are more inclined to deprecate the connector `async` option when >>>>> discussing FLIP-234 [1]. We should use hint to let planner decide. >>>>> Although the discussion has not yet produced a conclusion, can we >>> remove >>>>> this option from this FLIP? It doesn't seem to be related to this FLIP, >>>> but >>>>> more to FLIP-234, and we can form a conclusion over there. >>>>> >>>>> [1] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h >>>>> >>>>> Best, >>>>> Jingsong >>>>> >>>>> On Wed, Jun 1, 2022 at 4:59 AM Jing Ge <j...@ververica.com> wrote: >>>>> >>>>>> Hi Jark, >>>>>> >>>>>> Thanks for clarifying it. It would be fine. as long as we could >>> provide >>>> the >>>>>> no-cache solution. I was just wondering if the client side cache could >>>>>> really help when HBase is used, since the data to look up should be >>>> huge. >>>>>> Depending how much data will be cached on the client side, the data >>> that >>>>>> should be lru in e.g. LruBlockCache will not be lru anymore. In the >>>> worst >>>>>> case scenario, once the cached data at client side is expired, the >>>> request >>>>>> will hit disk which will cause extra latency temporarily, if I am not >>>>>> mistaken. >>>>>> >>>>>> Best regards, >>>>>> Jing >>>>>> >>>>>> On Mon, May 30, 2022 at 9:59 AM Jark Wu <imj...@gmail.com> wrote: >>>>>> >>>>>>> Hi Jing Ge, >>>>>>> >>>>>>> What do you mean about the "impact on the block cache used by HBase"? >>>>>>> In my understanding, the connector cache and HBase cache are totally >>>> two >>>>>>> things. >>>>>>> The connector cache is a local/client cache, and the HBase cache is a >>>>>>> server cache. >>>>>>> >>>>>>>> does it make sense to have a no-cache solution as one of the >>>>>>> default solutions so that customers will have no effort for the >>>> migration >>>>>>> if they want to stick with Hbase cache >>>>>>> >>>>>>> The implementation migration should be transparent to users. Take the >>>>>> HBase >>>>>>> connector as >>>>>>> an example, it already supports lookup cache but is disabled by >>>> default. >>>>>>> After migration, the >>>>>>> connector still disables cache by default (i.e. no-cache solution). >>> No >>>>>>> migration effort for users. >>>>>>> >>>>>>> HBase cache and connector cache are two different things. HBase cache >>>>>> can't >>>>>>> simply replace >>>>>>> connector cache. Because one of the most important usages for >>> connector >>>>>>> cache is reducing >>>>>>> the I/O request/response and improving the throughput, which can >>>> achieve >>>>>>> by just using a server cache. >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, 27 May 2022 at 22:42, Jing Ge <j...@ververica.com> wrote: >>>>>>> >>>>>>>> Thanks all for the valuable discussion. The new feature looks very >>>>>>>> interesting. >>>>>>>> >>>>>>>> According to the FLIP description: "*Currently we have JDBC, Hive >>> and >>>>>>> HBase >>>>>>>> connector implemented lookup table source. All existing >>>> implementations >>>>>>>> will be migrated to the current design and the migration will be >>>>>>>> transparent to end users*." I was only wondering if we should pay >>>>>>> attention >>>>>>>> to HBase and similar DBs. Since, commonly, the lookup data will be >>>> huge >>>>>>>> while using HBase, partial caching will be used in this case, if I >>> am >>>>>> not >>>>>>>> mistaken, which might have an impact on the block cache used by >>> HBase, >>>>>>> e.g. >>>>>>>> LruBlockCache. >>>>>>>> Another question is that, since HBase provides a sophisticated cache >>>>>>>> solution, does it make sense to have a no-cache solution as one of >>> the >>>>>>>> default solutions so that customers will have no effort for the >>>>>> migration >>>>>>>> if they want to stick with Hbase cache? >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Jing >>>>>>>> >>>>>>>> On Fri, May 27, 2022 at 11:19 AM Jingsong Li < >>> jingsongl...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I think the problem now is below: >>>>>>>>> 1. AllCache and PartialCache interface on the non-uniform, one >>> needs >>>>>> to >>>>>>>>> provide LookupProvider, the other needs to provide CacheBuilder. >>>>>>>>> 2. AllCache definition is not flexible, for example, PartialCache >>> can >>>>>>> use >>>>>>>>> any custom storage, while the AllCache can not, AllCache can also >>> be >>>>>>>>> considered to store memory or disk, also need a flexible strategy. >>>>>>>>> 3. AllCache can not customize ReloadStrategy, currently only >>>>>>>>> ScheduledReloadStrategy. >>>>>>>>> >>>>>>>>> In order to solve the above problems, the following are my ideas. >>>>>>>>> >>>>>>>>> ## Top level cache interfaces: >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> public interface CacheLookupProvider extends >>>>>>>>> LookupTableSource.LookupRuntimeProvider { >>>>>>>>> >>>>>>>>> CacheBuilder createCacheBuilder(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> public interface CacheBuilder { >>>>>>>>> Cache create(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> public interface Cache { >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * Returns the value associated with key in this cache, or null >>>>>> if >>>>>>>>> there is no cached value for >>>>>>>>> * key. >>>>>>>>> */ >>>>>>>>> @Nullable >>>>>>>>> Collection<RowData> getIfPresent(RowData key); >>>>>>>>> >>>>>>>>> /** Returns the number of key-value mappings in the cache. */ >>>>>>>>> long size(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> ## Partial cache >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> public interface PartialCacheLookupFunction extends >>>>>>> CacheLookupProvider { >>>>>>>>> >>>>>>>>> @Override >>>>>>>>> PartialCacheBuilder createCacheBuilder(); >>>>>>>>> >>>>>>>>> /** Creates an {@link LookupFunction} instance. */ >>>>>>>>> LookupFunction createLookupFunction(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> public interface PartialCacheBuilder extends CacheBuilder { >>>>>>>>> >>>>>>>>> PartialCache create(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> public interface PartialCache extends Cache { >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * Associates the specified value rows with the specified key >>> row >>>>>>>>> in the cache. If the cache >>>>>>>>> * previously contained value associated with the key, the old >>>>>>>>> value is replaced by the >>>>>>>>> * specified value. >>>>>>>>> * >>>>>>>>> * @return the previous value rows associated with key, or null >>>>>> if >>>>>>>>> there was no mapping for key. >>>>>>>>> * @param key - key row with which the specified value is to be >>>>>>>>> associated >>>>>>>>> * @param value – value rows to be associated with the specified >>>>>>> key >>>>>>>>> */ >>>>>>>>> Collection<RowData> put(RowData key, Collection<RowData> value); >>>>>>>>> >>>>>>>>> /** Discards any cached value for the specified key. */ >>>>>>>>> void invalidate(RowData key); >>>>>>>>> } >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> ## All cache >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> public interface AllCacheLookupProvider extends >>> CacheLookupProvider { >>>>>>>>> >>>>>>>>> void registerReloadStrategy(ScheduledExecutorService >>>>>>>>> executorService, Reloader reloader); >>>>>>>>> >>>>>>>>> ScanTableSource.ScanRuntimeProvider getScanRuntimeProvider(); >>>>>>>>> >>>>>>>>> @Override >>>>>>>>> AllCacheBuilder createCacheBuilder(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> public interface AllCacheBuilder extends CacheBuilder { >>>>>>>>> >>>>>>>>> AllCache create(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> public interface AllCache extends Cache { >>>>>>>>> >>>>>>>>> void putAll(Iterator<Map<RowData, RowData>> allEntries); >>>>>>>>> >>>>>>>>> void clearAll(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> public interface Reloader { >>>>>>>>> >>>>>>>>> void reload(); >>>>>>>>> } >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jingsong >>>>>>>>> >>>>>>>>> On Fri, May 27, 2022 at 11:10 AM Jingsong Li < >>> jingsongl...@gmail.com >>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Qingsheng and all for your discussion. >>>>>>>>>> >>>>>>>>>> Very sorry to jump in so late. >>>>>>>>>> >>>>>>>>>> Maybe I missed something? >>>>>>>>>> My first impression when I saw the cache interface was, why don't >>>>>> we >>>>>>>>>> provide an interface similar to guava cache [1], on top of guava >>>>>>> cache, >>>>>>>>>> caffeine also makes extensions for asynchronous calls.[2] >>>>>>>>>> There is also the bulk load in caffeine too. >>>>>>>>>> >>>>>>>>>> I am also more confused why first from LookupCacheFactory.Builder >>>>>> and >>>>>>>>> then >>>>>>>>>> to Factory to create Cache. >>>>>>>>>> >>>>>>>>>> [1] https://github.com/google/guava >>>>>>>>>> [2] https://github.com/ben-manes/caffeine/wiki/Population >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jingsong >>>>>>>>>> >>>>>>>>>> On Thu, May 26, 2022 at 11:17 PM Jark Wu <imj...@gmail.com> >>> wrote: >>>>>>>>>> >>>>>>>>>>> After looking at the new introduced ReloadTime and Becket's >>>>>> comment, >>>>>>>>>>> I agree with Becket we should have a pluggable reloading >>> strategy. >>>>>>>>>>> We can provide some common implementations, e.g., periodic >>>>>>> reloading, >>>>>>>>> and >>>>>>>>>>> daily reloading. >>>>>>>>>>> But there definitely be some connector- or business-specific >>>>>>> reloading >>>>>>>>>>> strategies, e.g. >>>>>>>>>>> notify by a zookeeper watcher, reload once a new Hive partition >>> is >>>>>>>>>>> complete. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jark >>>>>>>>>>> >>>>>>>>>>> On Thu, 26 May 2022 at 11:52, Becket Qin <becket....@gmail.com> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for updating the FLIP. A few comments / questions below: >>>>>>>>>>>> >>>>>>>>>>>> 1. Is there a reason that we have both "XXXFactory" and >>>>>>>> "XXXProvider". >>>>>>>>>>>> What is the difference between them? If they are the same, can >>>>>> we >>>>>>>> just >>>>>>>>>>> use >>>>>>>>>>>> XXXFactory everywhere? >>>>>>>>>>>> >>>>>>>>>>>> 2. Regarding the FullCachingLookupProvider, should the reloading >>>>>>>>> policy >>>>>>>>>>>> also be pluggable? Periodical reloading could be sometimes be >>>>>>> tricky >>>>>>>>> in >>>>>>>>>>>> practice. For example, if user uses 24 hours as the cache >>>>>> refresh >>>>>>>>>>> interval >>>>>>>>>>>> and some nightly batch job delayed, the cache update may still >>>>>> see >>>>>>>> the >>>>>>>>>>>> stale data. >>>>>>>>>>>> >>>>>>>>>>>> 3. In DefaultLookupCacheFactory, it looks like InitialCapacity >>>>>>>> should >>>>>>>>> be >>>>>>>>>>>> removed. >>>>>>>>>>>> >>>>>>>>>>>> 4. The purpose of LookupFunctionProvider#cacheMissingKey() >>>>>> seems a >>>>>>>>>>> little >>>>>>>>>>>> confusing to me. If Optional<LookupCacheFactory> >>>>>> getCacheFactory() >>>>>>>>>>> returns >>>>>>>>>>>> a non-empty factory, doesn't that already indicates the >>>>>> framework >>>>>>> to >>>>>>>>>>> cache >>>>>>>>>>>> the missing keys? Also, why is this method returning an >>>>>>>>>>> Optional<Boolean> >>>>>>>>>>>> instead of boolean? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, May 25, 2022 at 5:07 PM Qingsheng Ren < >>>>>> renqs...@gmail.com >>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Lincoln and Jark, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the comments! If the community reaches a consensus >>>>>>> that >>>>>>>> we >>>>>>>>>>> use >>>>>>>>>>>>> SQL hint instead of table options to decide whether to use sync >>>>>>> or >>>>>>>>>>> async >>>>>>>>>>>>> mode, it’s indeed not necessary to introduce the “lookup.async” >>>>>>>>> option. >>>>>>>>>>>>> >>>>>>>>>>>>> I think it’s a good idea to let the decision of async made on >>>>>>> query >>>>>>>>>>>>> level, which could make better optimization with more >>>>>> infomation >>>>>>>>>>> gathered >>>>>>>>>>>>> by planner. Is there any FLIP describing the issue in >>>>>>> FLINK-27625? >>>>>>>> I >>>>>>>>>>>>> thought FLIP-234 is only proposing adding SQL hint for retry on >>>>>>>>> missing >>>>>>>>>>>>> instead of the entire async mode to be controlled by hint. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>> >>>>>>>>>>>>>> On May 25, 2022, at 15:13, Lincoln Lee < >>>>>> lincoln.8...@gmail.com >>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Jark, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for your reply! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Currently 'lookup.async' just lies in HBase connector, I have >>>>>>> no >>>>>>>>> idea >>>>>>>>>>>>>> whether or when to remove it (we can discuss it in another >>>>>>> issue >>>>>>>>> for >>>>>>>>>>> the >>>>>>>>>>>>>> HBase connector after FLINK-27625 is done), just not add it >>>>>>> into >>>>>>>> a >>>>>>>>>>>>> common >>>>>>>>>>>>>> option now. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Lincoln Lee >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jark Wu <imj...@gmail.com> 于2022年5月24日周二 20:14写道: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Lincoln, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have taken a look at FLIP-234, and I agree with you that >>>>>> the >>>>>>>>>>>>> connectors >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>> provide both async and sync runtime providers simultaneously >>>>>>>>> instead >>>>>>>>>>>>> of one >>>>>>>>>>>>>>> of them. >>>>>>>>>>>>>>> At that point, "lookup.async" looks redundant. If this >>>>>> option >>>>>>> is >>>>>>>>>>>>> planned to >>>>>>>>>>>>>>> be removed >>>>>>>>>>>>>>> in the long term, I think it makes sense not to introduce it >>>>>>> in >>>>>>>>> this >>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, 24 May 2022 at 11:08, Lincoln Lee < >>>>>>>> lincoln.8...@gmail.com >>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sorry for jumping into the discussion so late. It's a good >>>>>>> idea >>>>>>>>>>> that >>>>>>>>>>>>> we >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>> have a common table option. I have a minor comments on >>>>>>>>>>> 'lookup.async' >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> not make it a common option: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The table layer abstracts both sync and async lookup >>>>>>>>> capabilities, >>>>>>>>>>>>>>>> connectors implementers can choose one or both, in the case >>>>>>> of >>>>>>>>>>>>>>> implementing >>>>>>>>>>>>>>>> only one capability(status of the most of existing builtin >>>>>>>>>>> connectors) >>>>>>>>>>>>>>>> 'lookup.async' will not be used. And when a connector has >>>>>>> both >>>>>>>>>>>>>>>> capabilities, I think this choice is more suitable for >>>>>> making >>>>>>>>>>>>> decisions >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>> the query level, for example, table planner can choose the >>>>>>>>> physical >>>>>>>>>>>>>>>> implementation of async lookup or sync lookup based on its >>>>>>> cost >>>>>>>>>>>>> model, or >>>>>>>>>>>>>>>> users can give query hint based on their own better >>>>>>>>>>> understanding. If >>>>>>>>>>>>>>>> there is another common table option 'lookup.async', it may >>>>>>>>> confuse >>>>>>>>>>>>> the >>>>>>>>>>>>>>>> users in the long run. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So, I prefer to leave the 'lookup.async' option in private >>>>>>>> place >>>>>>>>>>> (for >>>>>>>>>>>>> the >>>>>>>>>>>>>>>> current hbase connector) and not turn it into a common >>>>>>> option. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Lincoln Lee >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年5月23日周一 14:54写道: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Alexander, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the review! We recently updated the FLIP and >>>>>> you >>>>>>>> can >>>>>>>>>>> find >>>>>>>>>>>>>>>> those >>>>>>>>>>>>>>>>> changes from my latest email. Since some terminologies has >>>>>>>>>>> changed so >>>>>>>>>>>>>>>> I’ll >>>>>>>>>>>>>>>>> use the new concept for replying your comments. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Builder vs ‘of’ >>>>>>>>>>>>>>>>> I’m OK to use builder pattern if we have additional >>>>>> optional >>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>> for full caching mode (“rescan” previously). The >>>>>>>>>>> schedule-with-delay >>>>>>>>>>>>>>> idea >>>>>>>>>>>>>>>>> looks reasonable to me, but I think we need to redesign >>>>>> the >>>>>>>>>>> builder >>>>>>>>>>>>> API >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> full caching to make it more descriptive for developers. >>>>>>> Would >>>>>>>>> you >>>>>>>>>>>>> mind >>>>>>>>>>>>>>>>> sharing your ideas about the API? For accessing the FLIP >>>>>>>>> workspace >>>>>>>>>>>>> you >>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> just provide your account ID and ping any PMC member >>>>>>> including >>>>>>>>>>> Jark. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2. Common table options >>>>>>>>>>>>>>>>> We have some discussions these days and propose to >>>>>>> introduce 8 >>>>>>>>>>> common >>>>>>>>>>>>>>>>> table options about caching. It has been updated on the >>>>>>> FLIP. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 3. Retries >>>>>>>>>>>>>>>>> I think we are on the same page :-) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For your additional concerns: >>>>>>>>>>>>>>>>> 1) The table option has been updated. >>>>>>>>>>>>>>>>> 2) We got “lookup.cache” back for configuring whether to >>>>>> use >>>>>>>>>>> partial >>>>>>>>>>>>> or >>>>>>>>>>>>>>>>> full caching mode. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On May 19, 2022, at 17:25, Александр Смирнов < >>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Also I have a few additions: >>>>>>>>>>>>>>>>>> 1) maybe rename 'lookup.cache.maximum-size' to >>>>>>>>>>>>>>>>>> 'lookup.cache.max-rows'? I think it will be more clear >>>>>> that >>>>>>>> we >>>>>>>>>>> talk >>>>>>>>>>>>>>>>>> not about bytes, but about the number of rows. Plus it >>>>>> fits >>>>>>>>> more, >>>>>>>>>>>>>>>>>> considering my optimization with filters. >>>>>>>>>>>>>>>>>> 2) How will users enable rescanning? Are we going to >>>>>>> separate >>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>> and rescanning from the options point of view? Like >>>>>>> initially >>>>>>>>> we >>>>>>>>>>> had >>>>>>>>>>>>>>>>>> one option 'lookup.cache' with values LRU / ALL. I think >>>>>>> now >>>>>>>> we >>>>>>>>>>> can >>>>>>>>>>>>>>>>>> make a boolean option 'lookup.rescan'. RescanInterval can >>>>>>> be >>>>>>>>>>>>>>>>>> 'lookup.rescan.interval', etc. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>> Alexander >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> чт, 19 мая 2022 г. в 14:50, Александр Смирнов < >>>>>>>>>>> smirale...@gmail.com >>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Qingsheng and Jark, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. Builders vs 'of' >>>>>>>>>>>>>>>>>>> I understand that builders are used when we have >>>>>> multiple >>>>>>>>>>>>>>> parameters. >>>>>>>>>>>>>>>>>>> I suggested them because we could add parameters later. >>>>>> To >>>>>>>>>>> prevent >>>>>>>>>>>>>>>>>>> Builder for ScanRuntimeProvider from looking redundant I >>>>>>> can >>>>>>>>>>>>> suggest >>>>>>>>>>>>>>>>>>> one more config now - "rescanStartTime". >>>>>>>>>>>>>>>>>>> It's a time in UTC (LocalTime class) when the first >>>>>> reload >>>>>>>> of >>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>> starts. This parameter can be thought of as >>>>>> 'initialDelay' >>>>>>>>> (diff >>>>>>>>>>>>>>>>>>> between current time and rescanStartTime) in method >>>>>>>>>>>>>>>>>>> ScheduleExecutorService#scheduleWithFixedDelay [1] . It >>>>>>> can >>>>>>>> be >>>>>>>>>>> very >>>>>>>>>>>>>>>>>>> useful when the dimension table is updated by some other >>>>>>>>>>> scheduled >>>>>>>>>>>>>>> job >>>>>>>>>>>>>>>>>>> at a certain time. Or when the user simply wants a >>>>>> second >>>>>>>> scan >>>>>>>>>>>>>>> (first >>>>>>>>>>>>>>>>>>> cache reload) be delayed. This option can be used even >>>>>>>> without >>>>>>>>>>>>>>>>>>> 'rescanInterval' - in this case 'rescanInterval' will be >>>>>>> one >>>>>>>>>>> day. >>>>>>>>>>>>>>>>>>> If you are fine with this option, I would be very glad >>>>>> if >>>>>>>> you >>>>>>>>>>> would >>>>>>>>>>>>>>>>>>> give me access to edit FLIP page, so I could add it >>>>>> myself >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2. Common table options >>>>>>>>>>>>>>>>>>> I also think that FactoryUtil would be overloaded by all >>>>>>>> cache >>>>>>>>>>>>>>>>>>> options. But maybe unify all suggested options, not only >>>>>>> for >>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>>> cache? I.e. class 'LookupOptions', that unifies default >>>>>>>> cache >>>>>>>>>>>>>>> options, >>>>>>>>>>>>>>>>>>> rescan options, 'async', 'maxRetries'. WDYT? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 3. Retries >>>>>>>>>>>>>>>>>>> I'm fine with suggestion close to >>>>>>> RetryUtils#tryTimes(times, >>>>>>>>>>> call) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>> Alexander >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ср, 18 мая 2022 г. в 16:04, Qingsheng Ren < >>>>>>>> renqs...@gmail.com >>>>>>>>>> : >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Jark and Alexander, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for your comments! I’m also OK to introduce >>>>>> common >>>>>>>>> table >>>>>>>>>>>>>>>>> options. I prefer to introduce a new >>>>>>> DefaultLookupCacheOptions >>>>>>>>>>> class >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> holding these option definitions because putting all >>>>>> options >>>>>>>>> into >>>>>>>>>>>>>>>>> FactoryUtil would make it a bit ”crowded” and not well >>>>>>>>>>> categorized. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> FLIP has been updated according to suggestions above: >>>>>>>>>>>>>>>>>>>> 1. Use static “of” method for constructing >>>>>>>>>>> RescanRuntimeProvider >>>>>>>>>>>>>>>>> considering both arguments are required. >>>>>>>>>>>>>>>>>>>> 2. Introduce new table options matching >>>>>>>>>>> DefaultLookupCacheFactory >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, May 18, 2022 at 2:57 PM Jark Wu < >>>>>>> imj...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Alex, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1) retry logic >>>>>>>>>>>>>>>>>>>>> I think we can extract some common retry logic into >>>>>>>>> utilities, >>>>>>>>>>>>>>> e.g. >>>>>>>>>>>>>>>>> RetryUtils#tryTimes(times, call). >>>>>>>>>>>>>>>>>>>>> This seems independent of this FLIP and can be reused >>>>>> by >>>>>>>>>>>>>>> DataStream >>>>>>>>>>>>>>>>> users. >>>>>>>>>>>>>>>>>>>>> Maybe we can open an issue to discuss this and where >>>>>> to >>>>>>>> put >>>>>>>>>>> it. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2) cache ConfigOptions >>>>>>>>>>>>>>>>>>>>> I'm fine with defining cache config options in the >>>>>>>>> framework. >>>>>>>>>>>>>>>>>>>>> A candidate place to put is FactoryUtil which also >>>>>>>> includes >>>>>>>>>>>>>>>>> "sink.parallelism", "format" options. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, 18 May 2022 at 13:52, Александр Смирнов < >>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank you for considering my comments. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> there might be custom logic before making retry, >>>>>> such >>>>>>> as >>>>>>>>>>>>>>>>> re-establish the connection >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Yes, I understand that. I meant that such logic can >>>>>> be >>>>>>>>>>> placed in >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>> separate function, that can be implemented by >>>>>>> connectors. >>>>>>>>>>> Just >>>>>>>>>>>>>>>> moving >>>>>>>>>>>>>>>>>>>>>> the retry logic would make connector's LookupFunction >>>>>>>> more >>>>>>>>>>>>>>> concise >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>>>>>> avoid duplicate code. However, it's a minor change. >>>>>> The >>>>>>>>>>> decision >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> up >>>>>>>>>>>>>>>>>>>>>> to you. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let >>>>>>>>>>> developers >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> define their own options as we do now per connector. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> What is the reason for that? One of the main goals of >>>>>>>> this >>>>>>>>>>> FLIP >>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> unify the configs, wasn't it? I understand that >>>>>> current >>>>>>>>> cache >>>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>>>>>>>> doesn't depend on ConfigOptions, like was before. But >>>>>>>> still >>>>>>>>>>> we >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> put >>>>>>>>>>>>>>>>>>>>>> these options into the framework, so connectors can >>>>>>> reuse >>>>>>>>>>> them >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> avoid code duplication, and, what is more >>>>>> significant, >>>>>>>>> avoid >>>>>>>>>>>>>>>> possible >>>>>>>>>>>>>>>>>>>>>> different options naming. This moment can be pointed >>>>>>> out >>>>>>>> in >>>>>>>>>>>>>>>>>>>>>> documentation for connector developers. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>> Alexander >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> вт, 17 мая 2022 г. в 17:11, Qingsheng Ren < >>>>>>>>>>> renqs...@gmail.com>: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Alexander, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for the review and glad to see we are on the >>>>>>> same >>>>>>>>>>> page! >>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> think you forgot to cc the dev mailing list so I’m also >>>>>>>> quoting >>>>>>>>>>> your >>>>>>>>>>>>>>>> reply >>>>>>>>>>>>>>>>> under this email. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> We can add 'maxRetryTimes' option into this class >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> In my opinion the retry logic should be implemented >>>>>> in >>>>>>>>>>> lookup() >>>>>>>>>>>>>>>>> instead of in LookupFunction#eval(). Retrying is only >>>>>>>> meaningful >>>>>>>>>>>>> under >>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>> specific retriable failures, and there might be custom >>>>>> logic >>>>>>>>>>> before >>>>>>>>>>>>>>>> making >>>>>>>>>>>>>>>>> retry, such as re-establish the connection >>>>>>>>>>> (JdbcRowDataLookupFunction >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>> example), so it's more handy to leave it to the connector. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I don't see DDL options, that were in previous >>>>>>> version >>>>>>>> of >>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>>> Do >>>>>>>>>>>>>>>>> you have any special plans for them? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> We decide not to provide common DDL options and let >>>>>>>>>>> developers >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> define their own options as we do now per connector. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The rest of comments sound great and I’ll update the >>>>>>>> FLIP. >>>>>>>>>>> Hope >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> can finalize our proposal soon! >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On May 17, 2022, at 13:46, Александр Смирнов < >>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng and devs! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I like the overall design of updated FLIP, however >>>>>> I >>>>>>>> have >>>>>>>>>>>>>>> several >>>>>>>>>>>>>>>>>>>>>>>> suggestions and questions. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 1) Introducing LookupFunction as a subclass of >>>>>>>>>>> TableFunction >>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>>>>>>>>>> idea. We can add 'maxRetryTimes' option into this >>>>>>>> class. >>>>>>>>>>>>> 'eval' >>>>>>>>>>>>>>>>> method >>>>>>>>>>>>>>>>>>>>>>>> of new LookupFunction is great for this purpose. >>>>>> The >>>>>>>> same >>>>>>>>>>> is >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>> 'async' case. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 2) There might be other configs in future, such as >>>>>>>>>>>>>>>>> 'cacheMissingKey' >>>>>>>>>>>>>>>>>>>>>>>> in LookupFunctionProvider or 'rescanInterval' in >>>>>>>>>>>>>>>>> ScanRuntimeProvider. >>>>>>>>>>>>>>>>>>>>>>>> Maybe use Builder pattern in LookupFunctionProvider >>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> RescanRuntimeProvider for more flexibility (use one >>>>>>>>> 'build' >>>>>>>>>>>>>>>> method >>>>>>>>>>>>>>>>>>>>>>>> instead of many 'of' methods in future)? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 3) What are the plans for existing >>>>>>>> TableFunctionProvider >>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>> AsyncTableFunctionProvider? I think they should be >>>>>>>>>>> deprecated. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 4) Am I right that the current design does not >>>>>> assume >>>>>>>>>>> usage of >>>>>>>>>>>>>>>>>>>>>>>> user-provided LookupCache in re-scanning? In this >>>>>>> case, >>>>>>>>> it >>>>>>>>>>> is >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>>>>>>>>>> clear why do we need methods such as 'invalidate' >>>>>> or >>>>>>>>>>> 'putAll' >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>> LookupCache. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 5) I don't see DDL options, that were in previous >>>>>>>> version >>>>>>>>>>> of >>>>>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>>>> Do >>>>>>>>>>>>>>>>>>>>>>>> you have any special plans for them? >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> If you don't mind, I would be glad to be able to >>>>>> make >>>>>>>>> small >>>>>>>>>>>>>>>>>>>>>>>> adjustments to the FLIP document too. I think it's >>>>>>>> worth >>>>>>>>>>>>>>>> mentioning >>>>>>>>>>>>>>>>>>>>>>>> about what exactly optimizations are planning in >>>>>> the >>>>>>>>>>> future. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> пт, 13 мая 2022 г. в 20:27, Qingsheng Ren < >>>>>>>>>>> renqs...@gmail.com >>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and devs, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for the in-depth discussion! >>>>>> As >>>>>>>> Jark >>>>>>>>>>>>>>>>> mentioned we were inspired by Alexander's idea and made a >>>>>>>>>>> refactor on >>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>> design. FLIP-221 [1] has been updated to reflect our >>>>>> design >>>>>>>> now >>>>>>>>>>> and >>>>>>>>>>>>> we >>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>> happy to hear more suggestions from you! >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Compared to the previous design: >>>>>>>>>>>>>>>>>>>>>>>>> 1. The lookup cache serves at table runtime level >>>>>>> and >>>>>>>> is >>>>>>>>>>>>>>>>> integrated as a component of LookupJoinRunner as discussed >>>>>>>>>>>>> previously. >>>>>>>>>>>>>>>>>>>>>>>>> 2. Interfaces are renamed and re-designed to >>>>>> reflect >>>>>>>> the >>>>>>>>>>> new >>>>>>>>>>>>>>>>> design. >>>>>>>>>>>>>>>>>>>>>>>>> 3. We separate the all-caching case individually >>>>>> and >>>>>>>>>>>>>>> introduce a >>>>>>>>>>>>>>>>> new RescanRuntimeProvider to reuse the ability of >>>>>> scanning. >>>>>>> We >>>>>>>>> are >>>>>>>>>>>>>>>> planning >>>>>>>>>>>>>>>>> to support SourceFunction / InputFormat for now >>>>>> considering >>>>>>>> the >>>>>>>>>>>>>>>> complexity >>>>>>>>>>>>>>>>> of FLIP-27 Source API. >>>>>>>>>>>>>>>>>>>>>>>>> 4. A new interface LookupFunction is introduced to >>>>>>>> make >>>>>>>>>>> the >>>>>>>>>>>>>>>>> semantic of lookup more straightforward for developers. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> For replying to Alexander: >>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat >>>>>>> is >>>>>>>>>>>>>>> deprecated >>>>>>>>>>>>>>>>> or not. Am I right that it will be so in the future, but >>>>>>>>> currently >>>>>>>>>>>>> it's >>>>>>>>>>>>>>>> not? >>>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. InputFormat is not deprecated >>>>>> for >>>>>>>>> now. >>>>>>>>>>> I >>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>> it will be deprecated in the future but we don't have a >>>>>>> clear >>>>>>>>> plan >>>>>>>>>>>>> for >>>>>>>>>>>>>>>> that. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks again for the discussion on this FLIP and >>>>>>>> looking >>>>>>>>>>>>>>> forward >>>>>>>>>>>>>>>>> to cooperating with you after we finalize the design and >>>>>>>>>>> interfaces! >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 13, 2022 at 12:12 AM Александр >>>>>> Смирнов < >>>>>>>>>>>>>>>>> smirale...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark, Qingsheng and Leonard! >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see that we came to a consensus on almost >>>>>>> all >>>>>>>>>>>>> points! >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> However I'm a little confused whether InputFormat >>>>>>> is >>>>>>>>>>>>>>> deprecated >>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>> not. Am I right that it will be so in the future, >>>>>>> but >>>>>>>>>>>>>>> currently >>>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>>>>>>>>>> not? Actually I also think that for the first >>>>>>> version >>>>>>>>>>> it's >>>>>>>>>>>>> OK >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in ALL cache realization, because >>>>>>>>> supporting >>>>>>>>>>>>>>> rescan >>>>>>>>>>>>>>>>>>>>>>>>>> ability seems like a very distant prospect. But >>>>>> for >>>>>>>>> this >>>>>>>>>>>>>>>>> decision we >>>>>>>>>>>>>>>>>>>>>>>>>> need a consensus among all discussion >>>>>> participants. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> In general, I don't have something to argue with >>>>>>> your >>>>>>>>>>>>>>>>> statements. All >>>>>>>>>>>>>>>>>>>>>>>>>> of them correspond my ideas. Looking ahead, it >>>>>>> would >>>>>>>> be >>>>>>>>>>> nice >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP cooperatively. I've already done a >>>>>> lot >>>>>>>> of >>>>>>>>>>> work >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>> join caching with realization very close to the >>>>>> one >>>>>>>> we >>>>>>>>>>> are >>>>>>>>>>>>>>>>> discussing, >>>>>>>>>>>>>>>>>>>>>>>>>> and want to share the results of this work. >>>>>> Anyway >>>>>>>>>>> looking >>>>>>>>>>>>>>>>> forward for >>>>>>>>>>>>>>>>>>>>>>>>>> the FLIP update! >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 17:38, Jark Wu < >>>>>>>> imj...@gmail.com >>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alex, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for summarizing your points. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the past week, Qingsheng, Leonard, and I have >>>>>>>>>>> discussed >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>> several times >>>>>>>>>>>>>>>>>>>>>>>>>>> and we have totally refactored the design. >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm glad to say we have reached a consensus on >>>>>>> many >>>>>>>> of >>>>>>>>>>> your >>>>>>>>>>>>>>>>> points! >>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng is still working on updating the >>>>>> design >>>>>>>> docs >>>>>>>>>>> and >>>>>>>>>>>>>>>>> maybe can be >>>>>>>>>>>>>>>>>>>>>>>>>>> available in the next few days. >>>>>>>>>>>>>>>>>>>>>>>>>>> I will share some conclusions from our >>>>>>> discussions: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1) we have refactored the design towards to >>>>>> "cache >>>>>>>> in >>>>>>>>>>>>>>>>> framework" way. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2) a "LookupCache" interface for users to >>>>>>> customize >>>>>>>>> and >>>>>>>>>>> a >>>>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>>>>>>>>>>> implementation with builder for users to >>>>>> easy-use. >>>>>>>>>>>>>>>>>>>>>>>>>>> This can both make it possible to both have >>>>>>>>> flexibility >>>>>>>>>>> and >>>>>>>>>>>>>>>>> conciseness. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Filter pushdown is important for ALL and LRU >>>>>>>> lookup >>>>>>>>>>>>>>> cache, >>>>>>>>>>>>>>>>> esp reducing >>>>>>>>>>>>>>>>>>>>>>>>>>> IO. >>>>>>>>>>>>>>>>>>>>>>>>>>> Filter pushdown should be the final state and >>>>>> the >>>>>>>>>>> unified >>>>>>>>>>>>>>> way >>>>>>>>>>>>>>>>> to both >>>>>>>>>>>>>>>>>>>>>>>>>>> support pruning ALL cache and LRU cache, >>>>>>>>>>>>>>>>>>>>>>>>>>> so I think we should make effort in this >>>>>>> direction. >>>>>>>> If >>>>>>>>>>> we >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>> to support >>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown for ALL cache anyway, why not >>>>>> use >>>>>>>>>>>>>>>>>>>>>>>>>>> it for LRU cache as well? Either way, as we >>>>>> decide >>>>>>>> to >>>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>>>> the cache >>>>>>>>>>>>>>>>>>>>>>>>>>> in the framework, we have the chance to support >>>>>>>>>>>>>>>>>>>>>>>>>>> filter on cache anytime. This is an optimization >>>>>>> and >>>>>>>>> it >>>>>>>>>>>>>>>> doesn't >>>>>>>>>>>>>>>>> affect the >>>>>>>>>>>>>>>>>>>>>>>>>>> public API. I think we can create a JIRA issue >>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>> discuss it when the FLIP is accepted. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 4) The idea to support ALL cache is similar to >>>>>>> your >>>>>>>>>>>>>>> proposal. >>>>>>>>>>>>>>>>>>>>>>>>>>> In the first version, we will only support >>>>>>>>> InputFormat, >>>>>>>>>>>>>>>>> SourceFunction for >>>>>>>>>>>>>>>>>>>>>>>>>>> cache all (invoke InputFormat in join operator). >>>>>>>>>>>>>>>>>>>>>>>>>>> For FLIP-27 source, we need to join a true >>>>>> source >>>>>>>>>>> operator >>>>>>>>>>>>>>>>> instead of >>>>>>>>>>>>>>>>>>>>>>>>>>> calling it embedded in the join operator. >>>>>>>>>>>>>>>>>>>>>>>>>>> However, this needs another FLIP to support the >>>>>>>>> re-scan >>>>>>>>>>>>>>>> ability >>>>>>>>>>>>>>>>> for FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>> Source, and this can be a large work. >>>>>>>>>>>>>>>>>>>>>>>>>>> In order to not block this issue, we can put the >>>>>>>>> effort >>>>>>>>>>> of >>>>>>>>>>>>>>>>> FLIP-27 source >>>>>>>>>>>>>>>>>>>>>>>>>>> integration into future work and integrate >>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat&SourceFunction for now. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I think it's fine to use >>>>>>> InputFormat&SourceFunction, >>>>>>>>> as >>>>>>>>>>>>> they >>>>>>>>>>>>>>>>> are not >>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated, otherwise, we have to introduce >>>>>>> another >>>>>>>>>>>>> function >>>>>>>>>>>>>>>>>>>>>>>>>>> similar to them which is meaningless. We need to >>>>>>>> plan >>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>>>> integration ASAP before InputFormat & >>>>>>> SourceFunction >>>>>>>>> are >>>>>>>>>>>>>>>>> deprecated. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 15:46, Александр Смирнов >>>>>> < >>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Martijn! >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Got it. Therefore, the realization with >>>>>>> InputFormat >>>>>>>>> is >>>>>>>>>>> not >>>>>>>>>>>>>>>>> considered. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up! >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 12 мая 2022 г. в 14:23, Martijn Visser < >>>>>>>>>>>>>>>>> mart...@ververica.com>: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With regards to: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But if there are plans to refactor all >>>>>>> connectors >>>>>>>>> to >>>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, FLIP-27 is the target for all connectors. >>>>>>> The >>>>>>>>> old >>>>>>>>>>>>>>>>> interfaces will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>> deprecated and connectors will either be >>>>>>>> refactored >>>>>>>>> to >>>>>>>>>>>>> use >>>>>>>>>>>>>>>>> the new ones >>>>>>>>>>>>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>>>>>>>>>>> dropped. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> The caching should work for connectors that >>>>>> are >>>>>>>>> using >>>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>> interfaces, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we should not introduce new features for old >>>>>>>>>>> interfaces. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 12 May 2022 at 06:19, Александр >>>>>> Смирнов >>>>>>> < >>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jark! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the late response. I would like to >>>>>>> make >>>>>>>>>>> some >>>>>>>>>>>>>>>>> comments and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> clarify my points. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I agree with your first statement. I think >>>>>>> we >>>>>>>>> can >>>>>>>>>>>>>>>> achieve >>>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages this way: put the Cache interface >>>>>> in >>>>>>>>>>>>>>>>> flink-table-common, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but have implementations of it in >>>>>>>>>>> flink-table-runtime. >>>>>>>>>>>>>>>>> Therefore if a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector developer wants to use existing >>>>>> cache >>>>>>>>>>>>>>> strategies >>>>>>>>>>>>>>>>> and their >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations, he can just pass >>>>>> lookupConfig >>>>>>> to >>>>>>>>> the >>>>>>>>>>>>>>>>> planner, but if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he wants to have its own cache implementation >>>>>>> in >>>>>>>>> his >>>>>>>>>>>>>>>>> TableFunction, it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will be possible for him to use the existing >>>>>>>>>>> interface >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> purpose (we can explicitly point this out in >>>>>>> the >>>>>>>>>>>>>>>>> documentation). In >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this way all configs and metrics will be >>>>>>> unified. >>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the >>>>>>> cache, >>>>>>>> we >>>>>>>>>>> will >>>>>>>>>>>>>>>>> have 90% of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup requests that can never be cached >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Let me clarify the logic filters >>>>>>> optimization >>>>>>>> in >>>>>>>>>>> case >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> LRU cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It looks like Cache<RowData, >>>>>>>> Collection<RowData>>. >>>>>>>>>>> Here >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> always >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store the response of the dimension table in >>>>>>>> cache, >>>>>>>>>>> even >>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> applying calc function. I.e. if there are no >>>>>>> rows >>>>>>>>>>> after >>>>>>>>>>>>>>>>> applying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters to the result of the 'eval' method of >>>>>>>>>>>>>>>> TableFunction, >>>>>>>>>>>>>>>>> we store >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the empty list by lookup keys. Therefore the >>>>>>>> cache >>>>>>>>>>> line >>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filled, but will require much less memory (in >>>>>>>>> bytes). >>>>>>>>>>>>>>> I.e. >>>>>>>>>>>>>>>>> we don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> completely filter keys, by which result was >>>>>>>> pruned, >>>>>>>>>>> but >>>>>>>>>>>>>>>>> significantly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce required memory to store this result. >>>>>> If >>>>>>>> the >>>>>>>>>>> user >>>>>>>>>>>>>>>>> knows about >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this behavior, he can increase the 'max-rows' >>>>>>>>> option >>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>>> the start >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the job. But actually I came up with the >>>>>>> idea >>>>>>>>>>> that we >>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> do this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> automatically by using the 'maximumWeight' >>>>>> and >>>>>>>>>>> 'weigher' >>>>>>>>>>>>>>>>> methods of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> GuavaCache [1]. Weight can be the size of the >>>>>>>>>>> collection >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> rows >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (value of cache). Therefore cache can >>>>>>>> automatically >>>>>>>>>>> fit >>>>>>>>>>>>>>>> much >>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records than before. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL has provided a standard way to do >>>>>>>>> filters >>>>>>>>>>> and >>>>>>>>>>>>>>>>> projects >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and >>>>>>>>>>>>>>>>> SupportsProjectionPushDown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the >>>>>>>>> interfaces, >>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>> mean it's >>>>>>>>>>>>>>>>>>>>>>>>>>>> hard >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to implement. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's debatable how difficult it will be to >>>>>>>>> implement >>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>> pushdown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I think the fact that currently there is >>>>>> no >>>>>>>>>>> database >>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with filter pushdown at least means that this >>>>>>>>> feature >>>>>>>>>>>>>>> won't >>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supported soon in connectors. Moreover, if we >>>>>>>> talk >>>>>>>>>>> about >>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors (not in Flink repo), their >>>>>> databases >>>>>>>>> might >>>>>>>>>>>>> not >>>>>>>>>>>>>>>>> support all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink filters (or not support filters at >>>>>> all). >>>>>>> I >>>>>>>>>>> think >>>>>>>>>>>>>>>> users >>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested in supporting cache filters >>>>>>>> optimization >>>>>>>>>>>>>>>>> independently of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supporting other features and solving more >>>>>>>> complex >>>>>>>>>>>>>>> problems >>>>>>>>>>>>>>>>> (or >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> unsolvable at all). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) I agree with your third statement. >>>>>> Actually >>>>>>> in >>>>>>>>> our >>>>>>>>>>>>>>>>> internal version >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I also tried to unify the logic of scanning >>>>>> and >>>>>>>>>>>>> reloading >>>>>>>>>>>>>>>>> data from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. But unfortunately, I didn't find >>>>>> a >>>>>>>> way >>>>>>>>> to >>>>>>>>>>>>>>> unify >>>>>>>>>>>>>>>>> the logic >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of all ScanRuntimeProviders (InputFormat, >>>>>>>>>>>>> SourceFunction, >>>>>>>>>>>>>>>>> Source,...) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and reuse it in reloading ALL cache. As a >>>>>>> result >>>>>>>> I >>>>>>>>>>>>>>> settled >>>>>>>>>>>>>>>>> on using >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat, because it was used for scanning >>>>>>> in >>>>>>>>> all >>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors. (I didn't know that there are >>>>>> plans >>>>>>>> to >>>>>>>>>>>>>>>> deprecate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat in favor of FLIP-27 Source). IMO >>>>>>>> usage >>>>>>>>> of >>>>>>>>>>>>>>>>> FLIP-27 source >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in ALL caching is not good idea, because this >>>>>>>>> source >>>>>>>>>>> was >>>>>>>>>>>>>>>>> designed to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work in distributed environment >>>>>>> (SplitEnumerator >>>>>>>> on >>>>>>>>>>>>>>>>> JobManager and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SourceReaders on TaskManagers), not in one >>>>>>>> operator >>>>>>>>>>>>>>> (lookup >>>>>>>>>>>>>>>>> join >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator in our case). There is even no >>>>>> direct >>>>>>>> way >>>>>>>>> to >>>>>>>>>>>>>>> pass >>>>>>>>>>>>>>>>> splits from >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumerator to SourceReader (this logic >>>>>>> works >>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SplitEnumeratorContext, which requires >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OperatorCoordinator.SubtaskGateway to send >>>>>>>>>>>>>>> AddSplitEvents). >>>>>>>>>>>>>>>>> Usage of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat for ALL cache seems much more >>>>>>> clearer >>>>>>>>> and >>>>>>>>>>>>>>>>> easier. But if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are plans to refactor all connectors to >>>>>>>>>>> FLIP-27, I >>>>>>>>>>>>>>>>> have the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> following ideas: maybe we can refuse from >>>>>>> lookup >>>>>>>>> join >>>>>>>>>>>>> ALL >>>>>>>>>>>>>>>>> cache in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> favor of simple join with multiple scanning >>>>>> of >>>>>>>>> batch >>>>>>>>>>>>>>>> source? >>>>>>>>>>>>>>>>> The point >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is that the only difference between lookup >>>>>> join >>>>>>>> ALL >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>> and simple >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join with batch source is that in the first >>>>>>> case >>>>>>>>>>>>> scanning >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> performed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> multiple times, in between which state >>>>>> (cache) >>>>>>> is >>>>>>>>>>>>> cleared >>>>>>>>>>>>>>>>> (correct me >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> if I'm wrong). So what if we extend the >>>>>>>>>>> functionality of >>>>>>>>>>>>>>>>> simple join >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to support state reloading + extend the >>>>>>>>>>> functionality of >>>>>>>>>>>>>>>>> scanning >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch source multiple times (this one should >>>>>> be >>>>>>>>> easy >>>>>>>>>>>>> with >>>>>>>>>>>>>>>>> new FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source, that unifies streaming/batch reading >>>>>> - >>>>>>> we >>>>>>>>>>> will >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>> to change >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only SplitEnumerator, which will pass splits >>>>>>>> again >>>>>>>>>>> after >>>>>>>>>>>>>>>>> some TTL). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> WDYT? I must say that this looks like a >>>>>>> long-term >>>>>>>>>>> goal >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> will make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the scope of this FLIP even larger than you >>>>>>> said. >>>>>>>>>>> Maybe >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>> can limit >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ourselves to a simpler solution now >>>>>>>> (InputFormats). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So to sum up, my points is like this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) There is a way to make both concise and >>>>>>>> flexible >>>>>>>>>>>>>>>>> interfaces for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching in lookup join. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Cache filters optimization is important >>>>>> both >>>>>>>> in >>>>>>>>>>> LRU >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> ALL caches. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) It is unclear when filter pushdown will be >>>>>>>>>>> supported >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> Flink >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors, some of the connectors might not >>>>>>> have >>>>>>>>> the >>>>>>>>>>>>>>>>> opportunity to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support filter pushdown + as I know, >>>>>> currently >>>>>>>>> filter >>>>>>>>>>>>>>>>> pushdown works >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only for scanning (not lookup). So cache >>>>>>> filters >>>>>>>> + >>>>>>>>>>>>>>>>> projections >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization should be independent from other >>>>>>>>>>> features. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4) ALL cache realization is a complex topic >>>>>>> that >>>>>>>>>>>>> involves >>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aspects of how Flink is developing. Refusing >>>>>>> from >>>>>>>>>>>>>>>>> InputFormat in favor >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of FLIP-27 Source will make ALL cache >>>>>>> realization >>>>>>>>>>> really >>>>>>>>>>>>>>>>> complex and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not clear, so maybe instead of that we can >>>>>>> extend >>>>>>>>> the >>>>>>>>>>>>>>>>> functionality of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simple join or not refuse from InputFormat in >>>>>>>> case >>>>>>>>> of >>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>> join ALL >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> https://guava.dev/releases/18.0/api/docs/com/google/common/cache/CacheBuilder.html#weigher(com.google.common.cache.Weigher) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> чт, 5 мая 2022 г. в 20:34, Jark Wu < >>>>>>>>> imj...@gmail.com >>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's great to see the active discussion! I >>>>>>> want >>>>>>>> to >>>>>>>>>>>>> share >>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>> ideas: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) implement the cache in framework vs. >>>>>>>> connectors >>>>>>>>>>> base >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have a strong opinion on this. Both >>>>>>> ways >>>>>>>>>>> should >>>>>>>>>>>>>>>>> work (e.g., >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pruning, compatibility). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The framework way can provide more concise >>>>>>>>>>> interfaces. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The connector base way can define more >>>>>>> flexible >>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies/implementations. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating a way to see if >>>>>> we >>>>>>>> can >>>>>>>>>>> have >>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>>>>>>>>>>>>> advantages. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We should reach a consensus that the way >>>>>>> should >>>>>>>>> be a >>>>>>>>>>>>>>> final >>>>>>>>>>>>>>>>> state, >>>>>>>>>>>>>>>>>>>>>>>>>>>> and we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are on the path to it. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) filters and projections pushdown: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with Alex that the filter pushdown >>>>>>> into >>>>>>>>>>> cache >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> benefit a >>>>>>>>>>>>>>>>>>>>>>>>>>>> lot >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ALL cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, this is not true for LRU cache. >>>>>>>>> Connectors >>>>>>>>>>> use >>>>>>>>>>>>>>>>> cache to >>>>>>>>>>>>>>>>>>>>>>>>>>>> reduce >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IO >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests to databases for better throughput. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If a filter can prune 90% of data in the >>>>>>> cache, >>>>>>>> we >>>>>>>>>>> will >>>>>>>>>>>>>>>>> have 90% of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests that can never be cached >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and hit directly to the databases. That >>>>>> means >>>>>>>> the >>>>>>>>>>> cache >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>> meaningless in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this case. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> IMO, Flink SQL has provided a standard way >>>>>> to >>>>>>> do >>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>> and projects >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown, i.e., SupportsFilterPushDown and >>>>>>>>>>>>>>>>>>>>>>>>>>>> SupportsProjectionPushDown. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jdbc/hive/HBase haven't implemented the >>>>>>>>> interfaces, >>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>> mean it's >>>>>>>>>>>>>>>>>>>>>>>>>>>> hard >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implement. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They should implement the pushdown >>>>>> interfaces >>>>>>> to >>>>>>>>>>> reduce >>>>>>>>>>>>>>> IO >>>>>>>>>>>>>>>>> and the >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That should be a final state that the scan >>>>>>>> source >>>>>>>>>>> and >>>>>>>>>>>>>>>>> lookup source >>>>>>>>>>>>>>>>>>>>>>>>>>>> share >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the exact pushdown implementation. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't see why we need to duplicate the >>>>>>>> pushdown >>>>>>>>>>> logic >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> caches, >>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will complex the lookup join design. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) ALL cache abstraction >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All cache might be the most challenging part >>>>>>> of >>>>>>>>> this >>>>>>>>>>>>>>> FLIP. >>>>>>>>>>>>>>>>> We have >>>>>>>>>>>>>>>>>>>>>>>>>>>> never >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provided a reload-lookup public interface. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently, we put the reload logic in the >>>>>>> "eval" >>>>>>>>>>> method >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> That's hard for some sources (e.g., Hive). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideally, connector implementation should >>>>>> share >>>>>>>> the >>>>>>>>>>>>> logic >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> reload >>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scan, i.e. ScanTableSource with >>>>>>>>>>>>>>>>> InputFormat/SourceFunction/FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Source. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, InputFormat/SourceFunction are >>>>>>>>> deprecated, >>>>>>>>>>> and >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> FLIP-27 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is deeply coupled with SourceOperator. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we want to invoke the FLIP-27 source in >>>>>>>>>>> LookupJoin, >>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> may make >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope of this FLIP much larger. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We are still investigating how to abstract >>>>>> the >>>>>>>> ALL >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>> logic and >>>>>>>>>>>>>>>>>>>>>>>>>>>> reuse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the existing source interfaces. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 20:22, Roman Boyko < >>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's a much more complicated activity and >>>>>>> lies >>>>>>>>> out >>>>>>>>>>> of >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> scope of >>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> improvement. Because such pushdowns should >>>>>> be >>>>>>>>> done >>>>>>>>>>> for >>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ScanTableSource >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementations (not only for Lookup ones). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 19:02, Martijn >>>>>> Visser < >>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One question regarding "And Alexander >>>>>>>> correctly >>>>>>>>>>>>>>>> mentioned >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdown still is not implemented for >>>>>>>>>>>>>>> jdbc/hive/hbase." >>>>>>>>>>>>>>>>> -> Would >>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alternative solution be to actually >>>>>>> implement >>>>>>>>>>> these >>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>> pushdowns? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> imagine that there are many more benefits >>>>>> to >>>>>>>>> doing >>>>>>>>>>>>>>> that, >>>>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching and metrics. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn Visser >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://twitter.com/MartijnVisser82 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/MartijnVisser >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, 5 May 2022 at 13:58, Roman Boyko < >>>>>>>>>>>>>>>>> ro.v.bo...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving such a valuable >>>>>>>> improvement! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I do think that single cache >>>>>> implementation >>>>>>>>>>> would be >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>> nice >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> opportunity >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users. And it will break the "FOR >>>>>>> SYSTEM_TIME >>>>>>>>> AS >>>>>>>>>>> OF >>>>>>>>>>>>>>>>> proc_time" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anyway - doesn't matter how it will be >>>>>>>>>>> implemented. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Putting myself in the user's shoes, I can >>>>>>> say >>>>>>>>>>> that: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I would prefer to have the opportunity >>>>>>> to >>>>>>>>> cut >>>>>>>>>>> off >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>> size >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply filtering unnecessary data. And >>>>>> the >>>>>>>> most >>>>>>>>>>>>> handy >>>>>>>>>>>>>>>>> way to do >>>>>>>>>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it inside LookupRunners. It would be a >>>>>> bit >>>>>>>>>>> harder to >>>>>>>>>>>>>>>>> pass it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> through the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoin node to TableFunction. And >>>>>>>> Alexander >>>>>>>>>>>>>>>> correctly >>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter pushdown still is not implemented >>>>>>> for >>>>>>>>>>>>>>>>> jdbc/hive/hbase. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) The ability to set the different >>>>>> caching >>>>>>>>>>>>>>> parameters >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is quite important. So I would prefer to >>>>>>> set >>>>>>>> it >>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>>>>>>>>> rather >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have similar ttla, strategy and other >>>>>>> options >>>>>>>>> for >>>>>>>>>>>>> all >>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>> tables. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Providing the cache into the framework >>>>>>>>> really >>>>>>>>>>>>>>>>> deprives us of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extensibility (users won't be able to >>>>>>>> implement >>>>>>>>>>>>> their >>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> probably it might be solved by creating >>>>>>> more >>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategies >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a wider set of configurations. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> All these points are much closer to the >>>>>>>> schema >>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alexander. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingshen Ren, please correct me if I'm >>>>>> not >>>>>>>>> right >>>>>>>>>>> and >>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> facilities >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> might be simply implemented in your >>>>>>>>> architecture? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 4 May 2022 at 21:01, Martijn >>>>>>> Visser < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> martijnvis...@apache.org> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't have much to chip in, but just >>>>>>>> wanted >>>>>>>>> to >>>>>>>>>>>>>>>>> express that >>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciate the in-depth discussion on >>>>>> this >>>>>>>>> topic >>>>>>>>>>>>>>> and I >>>>>>>>>>>>>>>>> hope >>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> others >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will join the conversation. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Martijn >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 3 May 2022 at 10:15, Александр >>>>>>>>> Смирнов < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, Leonard and Jark, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your detailed feedback! >>>>>>>> However, I >>>>>>>>>>> have >>>>>>>>>>>>>>>>> questions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some of your statements (maybe I didn't >>>>>>> get >>>>>>>>>>>>>>>>> something?). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Caching actually breaks the semantic >>>>>> of >>>>>>>> "FOR >>>>>>>>>>>>>>>>> SYSTEM_TIME >>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time” >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree that the semantics of "FOR >>>>>>>>> SYSTEM_TIME >>>>>>>>>>> AS >>>>>>>>>>>>>>> OF >>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time" >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fully implemented with caching, but as >>>>>>> you >>>>>>>>>>> said, >>>>>>>>>>>>>>>> users >>>>>>>>>>>>>>>>> go >>>>>>>>>>>>>>>>>>>>>>>>>>>> on it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consciously to achieve better >>>>>> performance >>>>>>>> (no >>>>>>>>>>> one >>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching by default, etc.). Or by users >>>>>> do >>>>>>>> you >>>>>>>>>>> mean >>>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors? In this case developers >>>>>>>>> explicitly >>>>>>>>>>>>>>>> specify >>>>>>>>>>>>>>>>>>>>>>>>>>>> whether >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector supports caching or not (in >>>>>> the >>>>>>>>> list >>>>>>>>>>> of >>>>>>>>>>>>>>>>> supported >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options), >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> no one makes them do that if they don't >>>>>>>> want >>>>>>>>>>> to. >>>>>>>>>>>>> So >>>>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference between implementing >>>>>>> caching >>>>>>>>> in >>>>>>>>>>>>>>>> modules >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime and in >>>>>>>> flink-table-common >>>>>>>>>>> from >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> considered >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point of view? How does it affect on >>>>>>>>>>>>>>>>> breaking/non-breaking >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics of "FOR SYSTEM_TIME AS OF >>>>>>>>> proc_time"? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a situation that allows table >>>>>>>>>>> options in >>>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> control >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behavior of the framework, which has >>>>>>> never >>>>>>>>>>>>> happened >>>>>>>>>>>>>>>>>>>>>>>>>>>> previously >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be cautious >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we talk about main differences of >>>>>>>>> semantics >>>>>>>>>>> of >>>>>>>>>>>>>>> DDL >>>>>>>>>>>>>>>>>>>>>>>>>>>> options >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> config options("table.exec.xxx"), isn't >>>>>>> it >>>>>>>>>>> about >>>>>>>>>>>>>>>>> limiting >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the options + importance for the user >>>>>>>>> business >>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>> rather >>>>>>>>>>>>>>>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific location of corresponding >>>>>> logic >>>>>>> in >>>>>>>>> the >>>>>>>>>>>>>>>>> framework? I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mean >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in my design, for example, putting an >>>>>>>> option >>>>>>>>>>> with >>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strategy in configurations would be >>>>>> the >>>>>>>>> wrong >>>>>>>>>>>>>>>>> decision, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly affects the user's business >>>>>>> logic >>>>>>>>> (not >>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization) + touches just several >>>>>>>>> functions >>>>>>>>>>> of >>>>>>>>>>>>>>> ONE >>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (there >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be multiple tables with different >>>>>>> caches). >>>>>>>>>>> Does it >>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matter for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the user (or someone else) where the >>>>>>> logic >>>>>>>> is >>>>>>>>>>>>>>>> located, >>>>>>>>>>>>>>>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> affected by the applied option? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also I can remember DDL option >>>>>>>>>>> 'sink.parallelism', >>>>>>>>>>>>>>>>> which in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some way >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "controls the behavior of the >>>>>> framework" >>>>>>>> and >>>>>>>>> I >>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>> see any >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduce a new interface for this >>>>>>>>> all-caching >>>>>>>>>>>>>>>>> scenario >>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would become more complex >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a subject for a separate >>>>>>>> discussion, >>>>>>>>>>> but >>>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>>>>>>>>>>>>> in our >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal version we solved this problem >>>>>>>> quite >>>>>>>>>>>>>>> easily >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>> reused >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat class (so there is no need >>>>>>> for >>>>>>>> a >>>>>>>>>>> new >>>>>>>>>>>>>>>> API). >>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> point is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that currently all lookup connectors >>>>>> use >>>>>>>>>>>>>>> InputFormat >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scanning >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in batch mode: HBase, JDBC and >>>>>> even >>>>>>>> Hive >>>>>>>>>>> - it >>>>>>>>>>>>>>>> uses >>>>>>>>>>>>>>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PartitionReader, that is actually just >>>>>> a >>>>>>>>>>> wrapper >>>>>>>>>>>>>>>> around >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> InputFormat. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The advantage of this solution is the >>>>>>>> ability >>>>>>>>>>> to >>>>>>>>>>>>>>>> reload >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (number of threads depends on >>>>>>>> number >>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>> InputSplits, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an upper limit). As a result cache >>>>>> reload >>>>>>>>> time >>>>>>>>>>>>>>>>> significantly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reduces >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (as well as time of input stream >>>>>>>> blocking). I >>>>>>>>>>> know >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>> usually >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to avoid usage of concurrency in Flink >>>>>>>> code, >>>>>>>>>>> but >>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an exception. BTW I don't say that it's >>>>>>> an >>>>>>>>>>> ideal >>>>>>>>>>>>>>>>> solution, >>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are better ones. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Providing the cache in the framework >>>>>>> might >>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's possible only in cases when the >>>>>>>>> developer >>>>>>>>>>> of >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> won't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> properly refactor his code and will use >>>>>>> new >>>>>>>>>>> cache >>>>>>>>>>>>>>>>> options >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrectly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e. explicitly provide the same >>>>>> options >>>>>>>>> into >>>>>>>>>>> 2 >>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> places). For correct behavior all he >>>>>> will >>>>>>>>> need >>>>>>>>>>> to >>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>> is to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing options to the framework's >>>>>>>>>>> LookupConfig >>>>>>>>>>>>> (+ >>>>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>>>>>>>>>>>>> add an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> alias >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for options, if there was different >>>>>>>> naming), >>>>>>>>>>>>>>>> everything >>>>>>>>>>>>>>>>>>>>>>>>>>>> will be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transparent for users. If the developer >>>>>>>> won't >>>>>>>>>>> do >>>>>>>>>>>>>>>>>>>>>>>>>>>> refactoring at >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nothing will be changed for the >>>>>> connector >>>>>>>>>>> because >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>> backward >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility. Also if a developer >>>>>> wants >>>>>>> to >>>>>>>>> use >>>>>>>>>>>>> his >>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> he just can refuse to pass some of the >>>>>>>>> configs >>>>>>>>>>>>> into >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instead make his own implementation >>>>>> with >>>>>>>>>>> already >>>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics (but actually I think that >>>>>> it's a >>>>>>>>> rare >>>>>>>>>>>>>>> case). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections should be >>>>>> pushed >>>>>>>> all >>>>>>>>>>> the >>>>>>>>>>>>>>> way >>>>>>>>>>>>>>>>> down >>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, like what we do in the scan >>>>>>>> source >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's the great purpose. But the truth >>>>>> is >>>>>>>> that >>>>>>>>>>> the >>>>>>>>>>>>>>>> ONLY >>>>>>>>>>>>>>>>>>>>>>>>>>>> connector >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supports filter pushdown is >>>>>>>>>>> FileSystemTableSource >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (no database connector supports it >>>>>>>>> currently). >>>>>>>>>>>>> Also >>>>>>>>>>>>>>>>> for some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> databases >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it's simply impossible to pushdown such >>>>>>>>> complex >>>>>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>>>>>>>>>>>>> that we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in Flink. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only applying these optimizations to >>>>>> the >>>>>>>>> cache >>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Filters can cut off an arbitrarily >>>>>> large >>>>>>>>>>> amount of >>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>>>>>>>>>>> from the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dimension table. For a simple example, >>>>>>>>> suppose >>>>>>>>>>> in >>>>>>>>>>>>>>>>> dimension >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'users' >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have column 'age' with values from >>>>>> 20 >>>>>>> to >>>>>>>>> 40, >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> input >>>>>>>>>>>>>>>>>>>>>>>>>>>> stream >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'clicks' that is ~uniformly distributed >>>>>>> by >>>>>>>>> age >>>>>>>>>>> of >>>>>>>>>>>>>>>>> users. If >>>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter 'age > 30', >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there will be twice less data in cache. >>>>>>>> This >>>>>>>>>>> means >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> user >>>>>>>>>>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> increase 'lookup.cache.max-rows' by >>>>>>> almost >>>>>>>> 2 >>>>>>>>>>>>> times. >>>>>>>>>>>>>>>> It >>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gain a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> huge >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> performance boost. Moreover, this >>>>>>>>> optimization >>>>>>>>>>>>>>> starts >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> shine >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in 'ALL' cache, where tables without >>>>>>>> filters >>>>>>>>>>> and >>>>>>>>>>>>>>>>> projections >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fit >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in memory, but with them - can. This >>>>>>> opens >>>>>>>> up >>>>>>>>>>>>>>>>> additional >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> possibilities >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for users. And this doesn't sound as >>>>>> 'not >>>>>>>>> quite >>>>>>>>>>>>>>>>> useful'. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great to hear other voices >>>>>>>>>>> regarding >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> topic! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have quite a lot of controversial >>>>>>>> points, >>>>>>>>>>> and I >>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> help >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of others it will be easier for us to >>>>>>> come >>>>>>>>> to a >>>>>>>>>>>>>>>>> consensus. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пт, 29 апр. 2022 г. в 22:33, Qingsheng >>>>>>> Ren >>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>> renqs...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Alexander and Arvid, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the discussion and sorry >>>>>> for >>>>>>> my >>>>>>>>>>> late >>>>>>>>>>>>>>>>> response! >>>>>>>>>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> internal discussion together with Jark >>>>>>> and >>>>>>>>>>> Leonard >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> I’d >>>>>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> summarize our ideas. Instead of >>>>>>>> implementing >>>>>>>>>>> the >>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>> logic in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime layer or wrapping around the >>>>>>>>>>> user-provided >>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to introduce some new APIs >>>>>>> extending >>>>>>>>>>>>>>>>> TableFunction >>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concerns: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Caching actually breaks the >>>>>> semantic >>>>>>> of >>>>>>>>>>> "FOR >>>>>>>>>>>>>>>>>>>>>>>>>>>> SYSTEM_TIME >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AS OF >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proc_time”, because it couldn’t truly >>>>>>>> reflect >>>>>>>>>>> the >>>>>>>>>>>>>>>>> content >>>>>>>>>>>>>>>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table at the moment of querying. If >>>>>> users >>>>>>>>>>> choose >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> enable >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup table, they implicitly indicate >>>>>>> that >>>>>>>>>>> this >>>>>>>>>>>>>>>>> breakage is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> acceptable >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exchange for the performance. So we >>>>>>> prefer >>>>>>>>> not >>>>>>>>>>> to >>>>>>>>>>>>>>>>> provide >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caching on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table runtime level. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. If we make the cache implementation >>>>>>> in >>>>>>>>> the >>>>>>>>>>>>>>>>> framework >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (whether >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runner or a wrapper around >>>>>>> TableFunction), >>>>>>>> we >>>>>>>>>>> have >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>> confront a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> situation >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that allows table options in DDL to >>>>>>> control >>>>>>>>> the >>>>>>>>>>>>>>>>> behavior of >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> framework, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which has never happened previously and >>>>>>>>> should >>>>>>>>>>> be >>>>>>>>>>>>>>>>> cautious. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Under >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current design the behavior of the >>>>>>>> framework >>>>>>>>>>>>> should >>>>>>>>>>>>>>>>> only be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations (“table.exec.xxx”), and >>>>>>> it’s >>>>>>>>>>> hard >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> general >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configs to a specific table. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. We have use cases that lookup >>>>>> source >>>>>>>>> loads >>>>>>>>>>> and >>>>>>>>>>>>>>>>> refresh >>>>>>>>>>>>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> periodically into the memory to achieve >>>>>>>> high >>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hive >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connector in the community, and also >>>>>>> widely >>>>>>>>>>> used >>>>>>>>>>>>> by >>>>>>>>>>>>>>>> our >>>>>>>>>>>>>>>>>>>>>>>>>>>> internal >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connectors). Wrapping the cache around >>>>>>> the >>>>>>>>>>> user’s >>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> works >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for LRU caches, but I think we have to >>>>>>>>>>> introduce a >>>>>>>>>>>>>>>> new >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interface for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all-caching scenario and the design >>>>>> would >>>>>>>>>>> become >>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>>>>>>> complex. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Providing the cache in the >>>>>> framework >>>>>>>>> might >>>>>>>>>>>>>>>>> introduce >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatibility >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues to existing lookup sources like >>>>>>>> there >>>>>>>>>>> might >>>>>>>>>>>>>>>>> exist two >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> caches >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> totally different strategies if the >>>>>> user >>>>>>>>>>>>>>> incorrectly >>>>>>>>>>>>>>>>>>>>>>>>>>>> configures >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (one in the framework and another >>>>>>>> implemented >>>>>>>>>>> by >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> source). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> As for the optimization mentioned by >>>>>>>>>>> Alexander, I >>>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> projections should be pushed all the >>>>>> way >>>>>>>> down >>>>>>>>>>> to >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> table >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what we do in the scan source, instead >>>>>> of >>>>>>>> the >>>>>>>>>>>>>>> runner >>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal of using cache is to reduce the >>>>>>>> network >>>>>>>>>>> I/O >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>> pressure >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> external system, and only applying >>>>>> these >>>>>>>>>>>>>>>> optimizations >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not quite useful. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I made some updates to the FLIP[1] to >>>>>>>>> reflect >>>>>>>>>>> our >>>>>>>>>>>>>>>>> ideas. >>>>>>>>>>>>>>>>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prefer to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> keep the cache implementation as a part >>>>>>> of >>>>>>>>>>>>>>>>> TableFunction, >>>>>>>>>>>>>>>>>>>>>>>>>>>> and we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide some helper classes >>>>>>>>>>> (CachingTableFunction, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AllCachingTableFunction, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CachingAsyncTableFunction) to >>>>>> developers >>>>>>>> and >>>>>>>>>>>>>>> regulate >>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, I made a POC[2] for your >>>>>> reference. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your ideas! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> https://github.com/PatrickRen/flink/tree/FLIP-221 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 26, 2022 at 4:45 PM >>>>>>> Александр >>>>>>>>>>> Смирнов >>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Arvid! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have few comments on your message. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but could also live with an easier >>>>>>>>> solution >>>>>>>>>>> as >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> step: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that these 2 ways are >>>>>> mutually >>>>>>>>>>> exclusive >>>>>>>>>>>>>>>>>>>>>>>>>>>> (originally >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by Qingsheng and mine), because >>>>>>>>> conceptually >>>>>>>>>>>>> they >>>>>>>>>>>>>>>>> follow >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> goal, but implementation details are >>>>>>>>>>> different. >>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> go one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> moving to another way in the future >>>>>>> will >>>>>>>>> mean >>>>>>>>>>>>>>>>> deleting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> existing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and once again changing the API for >>>>>>>>>>> connectors. >>>>>>>>>>>>>>> So >>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>> think we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reach a consensus with the community >>>>>>>> about >>>>>>>>>>> that >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> then >>>>>>>>>>>>>>>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> together >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on this FLIP, i.e. divide the work on >>>>>>>> tasks >>>>>>>>>>> for >>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parts >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flip (for example, LRU cache >>>>>>> unification >>>>>>>> / >>>>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics / further work…). WDYT, >>>>>>>> Qingsheng? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as the source will only receive the >>>>>>>>> requests >>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually if filters are applied to >>>>>>> fields >>>>>>>>> of >>>>>>>>>>> the >>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table, we >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> firstly must do requests, and only >>>>>>> after >>>>>>>>>>> that we >>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>> filter >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> responses, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because lookup connectors don't have >>>>>>>> filter >>>>>>>>>>>>>>>>> pushdown. So >>>>>>>>>>>>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filtering >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is done before caching, there will be >>>>>>>> much >>>>>>>>>>> less >>>>>>>>>>>>>>>> rows >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your >>>>>>>>> architecture >>>>>>>>>>> is >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>> shared. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be >>>>>> honest. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for that, I’m a bit new to such >>>>>>>> kinds >>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>> conversations >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have no write access to the >>>>>>> confluence, >>>>>>>>> so >>>>>>>>>>> I >>>>>>>>>>>>>>>> made a >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jira >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> where described the proposed changes >>>>>> in >>>>>>>>> more >>>>>>>>>>>>>>>> details >>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-27411. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Will happy to get more feedback! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Smirnov Alexander >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> пн, 25 апр. 2022 г. в 19:49, Arvid >>>>>>> Heise >>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>> ar...@apache.org>: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this; the >>>>>>>> inconsistency >>>>>>>>>>> was >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> satisfying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Alexander's idea though but >>>>>>>> could >>>>>>>>>>> also >>>>>>>>>>>>>>>> live >>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easier >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution as the first step: Instead >>>>>> of >>>>>>>>>>> making >>>>>>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> detail of TableFunction X, rather >>>>>>>> devise a >>>>>>>>>>>>>>> caching >>>>>>>>>>>>>>>>>>>>>>>>>>>> layer >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> around X. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal would be a >>>>>>> CachingTableFunction >>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>>>>>>>>>> delegates to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> X in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> case >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> misses and else manages the cache. >>>>>>>> Lifting >>>>>>>>>>> it >>>>>>>>>>>>>>> into >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operator >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> model >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proposed would be even better but is >>>>>>>>>>> probably >>>>>>>>>>>>>>>>>>>>>>>>>>>> unnecessary >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first step >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for a lookup source (as the source >>>>>>> will >>>>>>>>> only >>>>>>>>>>>>>>>> receive >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requests >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filter; applying projection may be >>>>>>> more >>>>>>>>>>>>>>>> interesting >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>> save >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> memory). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another advantage is that all the >>>>>>>> changes >>>>>>>>> of >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> FLIP >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> limited to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> options, no need for new public >>>>>>>>> interfaces. >>>>>>>>>>>>>>>>> Everything >>>>>>>>>>>>>>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> remains >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation of Table runtime. >>>>>> That >>>>>>>>> means >>>>>>>>>>> we >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>>>>>>>>> easily >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorporate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimization potential that >>>>>> Alexander >>>>>>>>>>> pointed >>>>>>>>>>>>>>> out >>>>>>>>>>>>>>>>>>>>>>>>>>>> later. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Alexander unfortunately, your >>>>>>>>> architecture >>>>>>>>>>> is >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>>>>>>>> shared. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution to share images to be >>>>>> honest. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Apr 22, 2022 at 5:04 PM >>>>>>>> Александр >>>>>>>>>>>>>>> Смирнов >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> smirale...@gmail.com> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Qingsheng! My name is Alexander, >>>>>>> I'm >>>>>>>>>>> not a >>>>>>>>>>>>>>>>>>>>>>>>>>>> committer >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> yet, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> really like to become one. And this >>>>>>>> FLIP >>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> interested >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually I have worked on a similar >>>>>>>>>>> feature in >>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> company’s >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fork, and we would like to share >>>>>> our >>>>>>>>>>> thoughts >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>>>>>>>> this and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> make >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> open source. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there is a better >>>>>> alternative >>>>>>>>> than >>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abstract >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class for TableFunction >>>>>>>>>>>>> (CachingTableFunction). >>>>>>>>>>>>>>>> As >>>>>>>>>>>>>>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> know, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TableFunction exists in the >>>>>>>>>>> flink-table-common >>>>>>>>>>>>>>>>>>>>>>>>>>>> module, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> provides >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only an API for working with >>>>>> tables – >>>>>>>>> it’s >>>>>>>>>>>>> very >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> convenient >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> importing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in connectors. In turn, >>>>>>>>>>> CachingTableFunction >>>>>>>>>>>>>>>>> contains >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime execution, so this class >>>>>> and >>>>>>>>>>>>>>> everything >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connected >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should be located in another >>>>>> module, >>>>>>>>>>> probably >>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But this will require connectors to >>>>>>>>> depend >>>>>>>>>>> on >>>>>>>>>>>>>>>>> another >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> contains a lot of runtime logic, >>>>>>> which >>>>>>>>>>> doesn’t >>>>>>>>>>>>>>>>> sound >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> good. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding a new method >>>>>>>>>>>>> ‘getLookupConfig’ >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupTableSource >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or LookupRuntimeProvider to allow >>>>>>>>>>> connectors >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>>>>>>>>>> pass >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations to the planner, >>>>>>>> therefore >>>>>>>>>>> they >>>>>>>>>>>>>>>> won’t >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> depend on >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runtime >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> realization. Based on these configs >>>>>>>>> planner >>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lookup >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join operator with corresponding >>>>>>>> runtime >>>>>>>>>>> logic >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (ProcessFunctions >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> module flink-table-runtime). >>>>>>>> Architecture >>>>>>>>>>>>> looks >>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pinned >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> image (LookupConfig class there is >>>>>>>>> actually >>>>>>>>>>>>>>> yours >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CacheConfig). >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Classes in flink-table-planner, >>>>>> that >>>>>>>> will >>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>>>>>>>> responsible >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> – >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CommonPhysicalLookupJoin and his >>>>>>>>>>> inheritors. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Current classes for lookup join in >>>>>>>>>>>>>>>>>>>>>>>>>>>> flink-table-runtime >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunner, >>>>>>>> AsyncLookupJoinRunner, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinRunnerWithCalc, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AsyncLookupJoinRunnerWithCalc. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I suggest adding classes >>>>>>>>>>>>>>> LookupJoinCachingRunner, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LookupJoinCachingRunnerWithCalc, >>>>>> etc. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And here comes another more >>>>>> powerful >>>>>>>>>>> advantage >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>> such a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solution. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we have caching logic on a lower >>>>>>> level, >>>>>>>>> we >>>>>>>>>>> can >>>>>>>>>>>>>>>>> apply >>>>>>>>>>>>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> optimizations to it. >>>>>>>>>>> LookupJoinRunnerWithCalc >>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>>>>>>>>>>>>> named >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it uses the ‘calc’ >>>>>> function, >>>>>>>>> which >>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mostly >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consists >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> filters and projections. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, in join table A with >>>>>>>> lookup >>>>>>>>>>> table >>>>>>>>>>>>>>> B >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> condition >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘JOIN … >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ON >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A.id = B.id AND A.age = B.age + 10 >>>>>>>> WHERE >>>>>>>>>>>>>>>> B.salary > >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000’ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ‘calc’ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function will contain filters >>>>>> A.age = >>>>>>>>>>> B.age + >>>>>>>>>>>>>>> 10 >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> B.salary > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1000. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we apply this function before >>>>>>>> storing >>>>>>>>>>>>>>> records >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> size >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache will be significantly >>>>>> reduced: >>>>>>>>>>> filters = >>>>>>>>>>>>>>>>> avoid >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> useless >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> records in cache, projections = >>>>>>> reduce >>>>>>>>>>>>> records’ >>>>>>>>>>>>>>>>>>>>>>>>>>>> size. So >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> max number of records in cache can >>>>>> be >>>>>>>>>>>>> increased >>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> user. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What do you think about it? >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 2022/04/19 02:47:11 Qingsheng >>>>>> Ren >>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yuan and I would like to start a >>>>>>>>>>> discussion >>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-221[1], >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> introduces an abstraction of lookup >>>>>>>> table >>>>>>>>>>>>> cache >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>> its >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> standard >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Currently each lookup table source >>>>>>>>> should >>>>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> own >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cache to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> store lookup results, and there >>>>>>> isn’t a >>>>>>>>>>>>>>> standard >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> developers to tuning their jobs >>>>>> with >>>>>>>>> lookup >>>>>>>>>>>>>>>> joins, >>>>>>>>>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> common >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> use case in Flink table / SQL. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Therefore we propose some new APIs >>>>>>>>>>> including >>>>>>>>>>>>>>>>> cache, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metrics, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrapper >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> classes of TableFunction and new >>>>>>> table >>>>>>>>>>>>> options. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Please >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> take a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP page [1] to get more details. >>>>>>> Any >>>>>>>>>>>>>>>> suggestions >>>>>>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> comments >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> appreciated! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221+Abstraction+for+lookup+source+cache+and+metric >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Roman Boyko >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.: ro.v.bo...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Qingsheng Ren >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Real-time Computing Team >>>>>>>>>>>>>>>>>>>>>>>>> Alibaba Cloud >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Email: renqs...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> >>>