Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-22 Thread Alexander Smirnov
Hi Qingsheng, I like the current design, thanks for your efforts! I have no objections at the moment. +1 to start the vote. Best regards, Alexander ср, 22 июн. 2022 г. в 15:59, Qingsheng Ren : > > Hi Jingsong, > > 1. Updated and thanks for the reminder! > > 2. We could do so for implementation b

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-22 Thread Jingsong Li
Qingsheng Thanks for the update, Looks good to me! Best, Jingsong On Wed, Jun 22, 2022 at 5:00 PM Qingsheng Ren wrote: > > Hi Jingsong, > > 1. Updated and thanks for the reminder! > > 2. We could do so for implementation but as public interface I prefer not to > introduce another layer and exp

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-22 Thread Qingsheng Ren
Hi Jingsong, 1. Updated and thanks for the reminder! 2. We could do so for implementation but as public interface I prefer not to introduce another layer and expose too much since this FLIP is already a huge one with bunch of classes and interfaces. Best, Qingsheng > On Jun 22, 2022, at 11:16

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-21 Thread Jingsong Li
Thanks Qingsheng and all. I like this design. Some comments: 1. LookupCache implements Serializable? 2. Minor: After FLIP-234 [1], there should be many connectors that implement both PartialCachingLookupProvider and PartialCachingAsyncLookupProvider. Can we extract a common interface for `Looku

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-21 Thread Qingsheng Ren
Hi devs, I’d like to push FLIP-221 forward a little bit. Recently we had some offline discussions and updated the FLIP. Here’s the diff compared to the previous version: 1. (Async)LookupFunctionProvider is designed as a base interface for constructing lookup functions. 2. From the LookupFuncti

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-02 Thread Qingsheng Ren
Hi Becket, Thanks for your feedback! 1. An alternative way is to let the implementation of cache to decide whether to store a missing key in the cache instead of the framework. This sounds more reasonable and makes the LookupProvider interface cleaner. I can update the FLIP and clarify in the Jav

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-01 Thread Becket Qin
Thanks for updating the FLIP, Qingsheng. A few more comments: 1. I am still not sure about what is the use case for cacheMissingKey(). More specifically, when would users want to have getCache() return a non-empty value and cacheMissingKey() returns false? 2. The builder pattern. Usually the buil

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-01 Thread Jark Wu
Thank Qingsheng for the detailed summary and updates, The changes look good to me in general. I just have one minor improvement comment. Could we add a static util method to the "FullCachingReloadTrigger" interface for quick usage? #periodicReloadAtFixedRate(Duration) #periodicReloadWithFixedDela

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-01 Thread Qingsheng Ren
Hi Jingsong, Thanks for your comments! > AllCache definition is not flexible, for example, PartialCache can use any > custom storage, while the AllCache can not, AllCache can also be considered > to store memory or disk, also need a flexible strategy. We had an offline discussion with Jark an

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-01 Thread Qingsheng Ren
Hi Becket, Thanks for your comments! 1. We have removed the LookupCacheFactory in the latest design and we added open/close method to the LookupCache for initialization. 2. Custom reload strategy is a great idea! We added a new interface FullCachingReloadTrigger for developers to implement t

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-01 Thread Qingsheng Ren
Hi Alexander, Thanks for the update! I made some updates on the FLIP and here’s some diffs compared to your version: 1. ReloadStrategy has been covered by the new interface FullCachingReloadTrigger. This trigger will be taken over and run by the LookupJoinRunner and provides enough flexibilit

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-01 Thread Qingsheng Ren
Hi devs, Thanks for the in-depth discussion! We recently update some designs of FLIP-221 [1]: 1. Introduce a new interface “FullCachingReloadTrigger” for developer to customize the reload strategy. The previous design was only time-based and not flexable enough. Developers can implement any lo

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-06-01 Thread Jingsong Li
Thanks Alexander for your reply. We can discuss the new interface when it comes out. We are more inclined to deprecate the connector `async` option when discussing FLIP-234 [1]. We should use hint to let planner decide. Although the discussion has not yet produced a conclusion, can we remove this

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-31 Thread Jing Ge
Hi Jark, Thanks for clarifying it. It would be fine. as long as we could provide the no-cache solution. I was just wondering if the client side cache could really help when HBase is used, since the data to look up should be huge. Depending how much data will be cached on the client side, the data

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-30 Thread Alexander Smirnov
Hi Jingsong and devs! I agree that custom reloading would be very useful, so I changed recently proposed ReloadTime to customizable ReloadStrategy and its default realization FixedDelayReloadStrategy. I updated the FLIP, you can look at the new design [1]. From my point of view, the disadvantage o

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-30 Thread Jark Wu
Hi Jing Ge, What do you mean about the "impact on the block cache used by HBase"? In my understanding, the connector cache and HBase cache are totally two things. The connector cache is a local/client cache, and the HBase cache is a server cache. > does it make sense to have a no-cache solution a

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-27 Thread Jing Ge
Thanks all for the valuable discussion. The new feature looks very interesting. According to the FLIP description: "*Currently we have JDBC, Hive and HBase connector implemented lookup table source. All existing implementations will be migrated to the current design and the migration will be trans

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-27 Thread Jingsong Li
Hi all, I think the problem now is below: 1. AllCache and PartialCache interface on the non-uniform, one needs to provide LookupProvider, the other needs to provide CacheBuilder. 2. AllCache definition is not flexible, for example, PartialCache can use any custom storage, while the AllCache can no

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-26 Thread Jingsong Li
Thanks Qingsheng and all for your discussion. Very sorry to jump in so late. Maybe I missed something? My first impression when I saw the cache interface was, why don't we provide an interface similar to guava cache [1], on top of guava cache, caffeine also makes extensions for asynchronous calls

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-26 Thread Jark Wu
After looking at the new introduced ReloadTime and Becket's comment, I agree with Becket we should have a pluggable reloading strategy. We can provide some common implementations, e.g., periodic reloading, and daily reloading. But there definitely be some connector- or business-specific reloading s

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-25 Thread Becket Qin
Hi Qingsheng, Thanks for updating the FLIP. A few comments / questions below: 1. Is there a reason that we have both "XXXFactory" and "XXXProvider". What is the difference between them? If they are the same, can we just use XXXFactory everywhere? 2. Regarding the FullCachingLookupProvider, shoul

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-25 Thread Alexander Smirnov
Hi Qingsheng and devs, I like the new concept of partial / full caching. Thanks for your efforts, Qingsheng! > The schedule-with-delay idea looks reasonable to me, but I think we need to > redesign the builder API of full caching to make it more descriptive > for > developers. I've got the acc

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-25 Thread Qingsheng Ren
Hi Lincoln and Jark, Thanks for the comments! If the community reaches a consensus that we use SQL hint instead of table options to decide whether to use sync or async mode, it’s indeed not necessary to introduce the “lookup.async” option. I think it’s a good idea to let the decision of async

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-25 Thread Lincoln Lee
Hi Jark, Thanks for your reply! Currently 'lookup.async' just lies in HBase connector, I have no idea whether or when to remove it (we can discuss it in another issue for the HBase connector after FLINK-27625 is done), just not add it into a common option now. Best, Lincoln Lee Jark Wu 于2022年

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-24 Thread Jark Wu
Hi Lincoln, I have taken a look at FLIP-234, and I agree with you that the connectors can provide both async and sync runtime providers simultaneously instead of one of them. At that point, "lookup.async" looks redundant. If this option is planned to be removed in the long term, I think it makes s

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-23 Thread Lincoln Lee
Hi Qingsheng, Sorry for jumping into the discussion so late. It's a good idea that we can have a common table option. I have a minor comments on 'lookup.async' that not make it a common option: The table layer abstracts both sync and async lookup capabilities, connectors implementers can choose

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-22 Thread Qingsheng Ren
Hi Alexander, Thanks for the review! We recently updated the FLIP and you can find those changes from my latest email. Since some terminologies has changed so I’ll use the new concept for replying your comments. 1. Builder vs ‘of’ I’m OK to use builder pattern if we have additional optional pa

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-22 Thread Qingsheng Ren
Hi devs, We recently updated FLIP-221 [1] in order to make the concept of caching clearer and introduce some new common lookup table options. The changes are listed below: 1. "LRU cache" and “all cache” have been renamed as “partial cache” and “full cache”. It is not necessary for the cache to

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-19 Thread Александр Смирнов
Also I have a few additions: 1) maybe rename 'lookup.cache.maximum-size' to 'lookup.cache.max-rows'? I think it will be more clear that we talk not about bytes, but about the number of rows. Plus it fits more, considering my optimization with filters. 2) How will users enable rescanning? Are we goi

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-19 Thread Александр Смирнов
Hi Qingsheng and Jark, 1. Builders vs 'of' I understand that builders are used when we have multiple parameters. I suggested them because we could add parameters later. To prevent Builder for ScanRuntimeProvider from looking redundant I can suggest one more config now - "rescanStartTime". It's a t

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-18 Thread Leonard Xu
Thanks Qingsheng and Alexander for the update. Current API and Options design of this FLIP look good enough from my side,. If no more concerns about the thread, I think we can start a VOTE thread later. Best, Leonard > 2022年5月18日 下午5:04,Qingsheng Ren 写道: > > Hi Jark and Alexander, > > Thanks

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-18 Thread Qingsheng Ren
Hi Jark and Alexander, Thanks for your comments! I’m also OK to introduce common table options. I prefer to introduce a new DefaultLookupCacheOptions class for holding these option definitions because putting all options into FactoryUtil would make it a bit ”crowded” and not well categorized. FLI

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-17 Thread Jark Wu
Hi Alex, 1) retry logic I think we can extract some common retry logic into utilities, e.g. RetryUtils#tryTimes(times, call). This seems independent of this FLIP and can be reused by DataStream users. Maybe we can open an issue to discuss this and where to put it. 2) cache ConfigOptions I'm fine

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-17 Thread Александр Смирнов
Hi Qingsheng, Thank you for considering my comments. > there might be custom logic before making retry, such as re-establish the > connection Yes, I understand that. I meant that such logic can be placed in a separate function, that can be implemented by connectors. Just moving the retry logic

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-17 Thread Jark Wu
Thank Qingsheng for updating the design. I just have a minor comment. Changing RescanRuntimeProvider into the builder pattern seems not necessary. The builder pattern is usually used when there are several optional parameters. However, ScanRuntimeProvider and rescan interval are both required when

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-17 Thread Qingsheng Ren
Hi devs, I just updated FLIP-221 [1] according to discussions addressed above. This version made minor changes to make interfaces clearer: 1. The argument type of LookupFunction#lookup and AsyncLookupFunction#asyncLookup are changed from Object... to RowData, in order to be symmetric with out

Re:Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-17 Thread zst...@163.com
Hi Qingsheng, Alexander, Thanks for your reply. > Can you give an example of such upper - level Cache usage? It's not clear for > me currently. I think it's unnecessary to have such high level abstraction, > if nowhere in the code we won't operate with objects as instances of Cache. > But may

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-17 Thread Qingsheng Ren
Hi Yuan, Thanks for the review! Basically I’m with Alexander opinion. We’d like to limit the scope in lookup scenario so we didn’t extend the cache to a generic one. And as for the metric I think the existing metric definitions are also applicable for all-cache case. Cheers, Qingsheng > O

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-17 Thread Qingsheng Ren
Hi Alexander, Thanks for the review and glad to see we are on the same page! I think you forgot to cc the dev mailing list so I’m also quoting your reply under this email. > We can add 'maxRetryTimes' option into this class In my opinion the retry logic should be implemented in lookup() ins

Re: Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-16 Thread Александр Смирнов
Hi Yuan! > How about abtract the LookupCache to a higher level with a common Cache? Can you give an example of such upper - level Cache usage? It's not clear for me currently. I think it's unnecessary to have such high level abstraction, if nowhere in the code we won't operate with objects as ins

Re:Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-15 Thread zst...@163.com
Hi Qingsheng and devs, Thanks for your heated discussion and redesign to optmize this feature. I just have two comments: 1. How about abtract the LookupCache to a higher level with a common Cache? It will be convenient for devs to use in other place. 2. Does it have any metrics, such as

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-13 Thread Qingsheng Ren
Hi Alexander and devs, Thank you very much for the in-depth discussion! As Jark mentioned we were inspired by Alexander's idea and made a refactor on our design. FLIP-221 [1] has been updated to reflect our design now and we are happy to hear more suggestions from you! Compared to the previous de

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-12 Thread Александр Смирнов
Hi Jark, Qingsheng and Leonard! Glad to see that we came to a consensus on almost all points! However I'm a little confused whether InputFormat is deprecated or not. Am I right that it will be so in the future, but currently it's not? Actually I also think that for the first version it's OK to us

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-12 Thread Jark Wu
Hi Alex, Thanks for summarizing your points. In the past week, Qingsheng, Leonard, and I have discussed it several times and we have totally refactored the design. I'm glad to say we have reached a consensus on many of your points! Qingsheng is still working on updating the design docs and maybe

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-12 Thread Александр Смирнов
Hi Martijn! Got it. Therefore, the realization with InputFormat is not considered. Thanks for clearing that up! Best regards, Smirnov Alexander чт, 12 мая 2022 г. в 14:23, Martijn Visser : > > Hi, > > With regards to: > > > But if there are plans to refactor all connectors to FLIP-27 > > Yes, FL

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-12 Thread Martijn Visser
Hi, With regards to: > But if there are plans to refactor all connectors to FLIP-27 Yes, FLIP-27 is the target for all connectors. The old interfaces will be deprecated and connectors will either be refactored to use the new ones or dropped. The caching should work for connectors that are using

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-11 Thread Александр Смирнов
Hi Jark! Sorry for the late response. I would like to make some comments and clarify my points. 1) I agree with your first statement. I think we can achieve both advantages this way: put the Cache interface in flink-table-common, but have implementations of it in flink-table-runtime. Therefore if

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Jark Wu
It's great to see the active discussion! I want to share my ideas: 1) implement the cache in framework vs. connectors base I don't have a strong opinion on this. Both ways should work (e.g., cache pruning, compatibility). The framework way can provide more concise interfaces. The connector base wa

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Roman Boyko
It's a much more complicated activity and lies out of the scope of this improvement. Because such pushdowns should be done for all ScanTableSource implementations (not only for Lookup ones). On Thu, 5 May 2022 at 19:02, Martijn Visser wrote: > Hi everyone, > > One question regarding "And Alexand

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Martijn Visser
Hi everyone, One question regarding "And Alexander correctly mentioned that filter pushdown still is not implemented for jdbc/hive/hbase." -> Would an alternative solution be to actually implement these filter pushdowns? I can imagine that there are many more benefits to doing that, outside of loo

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Roman Boyko
Hi everyone! Thanks for driving such a valuable improvement! I do think that single cache implementation would be a nice opportunity for users. And it will break the "FOR SYSTEM_TIME AS OF proc_time" semantics anyway - doesn't matter how it will be implemented. Putting myself in the user's shoes

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-04 Thread Martijn Visser
Hi everyone, I don't have much to chip in, but just wanted to express that I really appreciate the in-depth discussion on this topic and I hope that others will join the conversation. Best regards, Martijn On Tue, 3 May 2022 at 10:15, Александр Смирнов wrote: > Hi Qingsheng, Leonard and Jark,

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-03 Thread Александр Смирнов
Hi Qingsheng, Leonard and Jark, Thanks for your detailed feedback! However, I have questions about some of your statements (maybe I didn't get something?). > Caching actually breaks the semantic of "FOR SYSTEM_TIME AS OF proc_time” I agree that the semantics of "FOR SYSTEM_TIME AS OF proc_time"

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-04-29 Thread Qingsheng Ren
Hi Alexander and Arvid, Thanks for the discussion and sorry for my late response! We had an internal discussion together with Jark and Leonard and I’d like to summarize our ideas. Instead of implementing the cache logic in the table runtime layer or wrapping around the user-provided table function

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-04-26 Thread Александр Смирнов
Thanks for the response, Arvid! I have few comments on your message. > but could also live with an easier solution as the first step: I think that these 2 ways are mutually exclusive (originally proposed by Qingsheng and mine), because conceptually they follow the same goal, but implementation d

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-04-25 Thread Arvid Heise
Hi Qingsheng, Thanks for driving this; the inconsistency was not satisfying for me. I second Alexander's idea though but could also live with an easier solution as the first step: Instead of making caching an implementation detail of TableFunction X, rather devise a caching layer around X. So the

RE: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-04-22 Thread Александр Смирнов
Hi Qingsheng! My name is Alexander, I'm not a committer yet, but I'd really like to become one. And this FLIP really interested me. Actually I have worked on a similar feature in my company’s Flink fork, and we would like to share our thoughts on this and make code open source. I think there is a

[DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-04-18 Thread Qingsheng Ren
Hi devs, Yuan and I would like to start a discussion about FLIP-221[1], which introduces an abstraction of lookup table cache and its standard metrics. Currently each lookup table source should implement their own cache to store lookup results, and there isn’t a standard of metrics for users a