Hi Qingsheng,
I like the current design, thanks for your efforts! I have no
objections at the moment.
+1 to start the vote.
Best regards,
Alexander
ср, 22 июн. 2022 г. в 15:59, Qingsheng Ren :
>
> Hi Jingsong,
>
> 1. Updated and thanks for the reminder!
>
> 2. We could do so for implementation b
Qingsheng Thanks for the update,
Looks good to me!
Best,
Jingsong
On Wed, Jun 22, 2022 at 5:00 PM Qingsheng Ren wrote:
>
> Hi Jingsong,
>
> 1. Updated and thanks for the reminder!
>
> 2. We could do so for implementation but as public interface I prefer not to
> introduce another layer and exp
Hi Jingsong,
1. Updated and thanks for the reminder!
2. We could do so for implementation but as public interface I prefer not to
introduce another layer and expose too much since this FLIP is already a huge
one with bunch of classes and interfaces.
Best,
Qingsheng
> On Jun 22, 2022, at 11:16
Thanks Qingsheng and all.
I like this design.
Some comments:
1. LookupCache implements Serializable?
2. Minor: After FLIP-234 [1], there should be many connectors that
implement both PartialCachingLookupProvider and
PartialCachingAsyncLookupProvider. Can we extract a common interface
for `Looku
Hi devs,
I’d like to push FLIP-221 forward a little bit. Recently we had some offline
discussions and updated the FLIP. Here’s the diff compared to the previous
version:
1. (Async)LookupFunctionProvider is designed as a base interface for
constructing lookup functions.
2. From the LookupFuncti
Hi Becket,
Thanks for your feedback!
1. An alternative way is to let the implementation of cache to decide
whether to store a missing key in the cache instead of the framework.
This sounds more reasonable and makes the LookupProvider interface
cleaner. I can update the FLIP and clarify in the Jav
Thanks for updating the FLIP, Qingsheng. A few more comments:
1. I am still not sure about what is the use case for cacheMissingKey().
More specifically, when would users want to have getCache() return a
non-empty value and cacheMissingKey() returns false?
2. The builder pattern. Usually the buil
Thank Qingsheng for the detailed summary and updates,
The changes look good to me in general. I just have one minor improvement
comment.
Could we add a static util method to the "FullCachingReloadTrigger"
interface for quick usage?
#periodicReloadAtFixedRate(Duration)
#periodicReloadWithFixedDela
Hi Jingsong,
Thanks for your comments!
> AllCache definition is not flexible, for example, PartialCache can use any
> custom storage, while the AllCache can not, AllCache can also be considered
> to store memory or disk, also need a flexible strategy.
We had an offline discussion with Jark an
Hi Becket,
Thanks for your comments!
1. We have removed the LookupCacheFactory in the latest design and we added
open/close method to the LookupCache for initialization.
2. Custom reload strategy is a great idea! We added a new interface
FullCachingReloadTrigger for developers to implement t
Hi Alexander,
Thanks for the update! I made some updates on the FLIP and here’s some diffs
compared to your version:
1. ReloadStrategy has been covered by the new interface
FullCachingReloadTrigger. This trigger will be taken over and run by the
LookupJoinRunner and provides enough flexibilit
Hi devs,
Thanks for the in-depth discussion! We recently update some designs of FLIP-221
[1]:
1. Introduce a new interface “FullCachingReloadTrigger” for developer to
customize the reload strategy. The previous design was only time-based and not
flexable enough. Developers can implement any lo
Thanks Alexander for your reply. We can discuss the new interface when it
comes out.
We are more inclined to deprecate the connector `async` option when
discussing FLIP-234 [1]. We should use hint to let planner decide.
Although the discussion has not yet produced a conclusion, can we remove
this
Hi Jark,
Thanks for clarifying it. It would be fine. as long as we could provide the
no-cache solution. I was just wondering if the client side cache could
really help when HBase is used, since the data to look up should be huge.
Depending how much data will be cached on the client side, the data
Hi Jingsong and devs!
I agree that custom reloading would be very useful, so I changed
recently proposed ReloadTime to customizable ReloadStrategy and its
default realization FixedDelayReloadStrategy. I updated the FLIP, you
can look at the new design [1]. From my point of view, the
disadvantage o
Hi Jing Ge,
What do you mean about the "impact on the block cache used by HBase"?
In my understanding, the connector cache and HBase cache are totally two
things.
The connector cache is a local/client cache, and the HBase cache is a
server cache.
> does it make sense to have a no-cache solution a
Thanks all for the valuable discussion. The new feature looks very
interesting.
According to the FLIP description: "*Currently we have JDBC, Hive and HBase
connector implemented lookup table source. All existing implementations
will be migrated to the current design and the migration will be
trans
Hi all,
I think the problem now is below:
1. AllCache and PartialCache interface on the non-uniform, one needs to
provide LookupProvider, the other needs to provide CacheBuilder.
2. AllCache definition is not flexible, for example, PartialCache can use
any custom storage, while the AllCache can no
Thanks Qingsheng and all for your discussion.
Very sorry to jump in so late.
Maybe I missed something?
My first impression when I saw the cache interface was, why don't we
provide an interface similar to guava cache [1], on top of guava cache,
caffeine also makes extensions for asynchronous calls
After looking at the new introduced ReloadTime and Becket's comment,
I agree with Becket we should have a pluggable reloading strategy.
We can provide some common implementations, e.g., periodic reloading, and
daily reloading.
But there definitely be some connector- or business-specific reloading
s
Hi Qingsheng,
Thanks for updating the FLIP. A few comments / questions below:
1. Is there a reason that we have both "XXXFactory" and "XXXProvider". What
is the difference between them? If they are the same, can we just use
XXXFactory everywhere?
2. Regarding the FullCachingLookupProvider, shoul
Hi Qingsheng and devs,
I like the new concept of partial / full caching. Thanks for your
efforts, Qingsheng!
> The schedule-with-delay idea looks reasonable to me, but I think we need to
> redesign the builder API of full caching to make it more descriptive > for
> developers.
I've got the acc
Hi Lincoln and Jark,
Thanks for the comments! If the community reaches a consensus that we use SQL
hint instead of table options to decide whether to use sync or async mode, it’s
indeed not necessary to introduce the “lookup.async” option.
I think it’s a good idea to let the decision of async
Hi Jark,
Thanks for your reply!
Currently 'lookup.async' just lies in HBase connector, I have no idea
whether or when to remove it (we can discuss it in another issue for the
HBase connector after FLINK-27625 is done), just not add it into a common
option now.
Best,
Lincoln Lee
Jark Wu 于2022年
Hi Lincoln,
I have taken a look at FLIP-234, and I agree with you that the connectors
can
provide both async and sync runtime providers simultaneously instead of one
of them.
At that point, "lookup.async" looks redundant. If this option is planned to
be removed
in the long term, I think it makes s
Hi Qingsheng,
Sorry for jumping into the discussion so late. It's a good idea that we can
have a common table option. I have a minor comments on 'lookup.async' that
not make it a common option:
The table layer abstracts both sync and async lookup capabilities,
connectors implementers can choose
Hi Alexander,
Thanks for the review! We recently updated the FLIP and you can find those
changes from my latest email. Since some terminologies has changed so I’ll use
the new concept for replying your comments.
1. Builder vs ‘of’
I’m OK to use builder pattern if we have additional optional pa
Hi devs,
We recently updated FLIP-221 [1] in order to make the concept of caching
clearer and introduce some new common lookup table options. The changes are
listed below:
1. "LRU cache" and “all cache” have been renamed as “partial cache” and “full
cache”. It is not necessary for the cache to
Also I have a few additions:
1) maybe rename 'lookup.cache.maximum-size' to
'lookup.cache.max-rows'? I think it will be more clear that we talk
not about bytes, but about the number of rows. Plus it fits more,
considering my optimization with filters.
2) How will users enable rescanning? Are we goi
Hi Qingsheng and Jark,
1. Builders vs 'of'
I understand that builders are used when we have multiple parameters.
I suggested them because we could add parameters later. To prevent
Builder for ScanRuntimeProvider from looking redundant I can suggest
one more config now - "rescanStartTime".
It's a t
Thanks Qingsheng and Alexander for the update. Current API and Options design
of this FLIP look good enough from my side,.
If no more concerns about the thread, I think we can start a VOTE thread later.
Best,
Leonard
> 2022年5月18日 下午5:04,Qingsheng Ren 写道:
>
> Hi Jark and Alexander,
>
> Thanks
Hi Jark and Alexander,
Thanks for your comments! I’m also OK to introduce common table options. I
prefer to introduce a new DefaultLookupCacheOptions class for holding these
option definitions because putting all options into FactoryUtil would make
it a bit ”crowded” and not well categorized.
FLI
Hi Alex,
1) retry logic
I think we can extract some common retry logic into utilities, e.g.
RetryUtils#tryTimes(times, call).
This seems independent of this FLIP and can be reused by DataStream users.
Maybe we can open an issue to discuss this and where to put it.
2) cache ConfigOptions
I'm fine
Hi Qingsheng,
Thank you for considering my comments.
> there might be custom logic before making retry, such as re-establish the
> connection
Yes, I understand that. I meant that such logic can be placed in a
separate function, that can be implemented by connectors. Just moving
the retry logic
Thank Qingsheng for updating the design.
I just have a minor comment.
Changing RescanRuntimeProvider into the builder pattern seems not necessary.
The builder pattern is usually used when there are several optional
parameters.
However, ScanRuntimeProvider and rescan interval are both required when
Hi devs,
I just updated FLIP-221 [1] according to discussions addressed above. This
version made minor changes to make interfaces clearer:
1. The argument type of LookupFunction#lookup and
AsyncLookupFunction#asyncLookup are changed from Object... to RowData, in order
to be symmetric with out
Hi Qingsheng, Alexander,
Thanks for your reply.
> Can you give an example of such upper - level Cache usage? It's not clear for
> me currently. I think it's unnecessary to have such high level abstraction,
> if nowhere in the code we won't operate with objects as instances of Cache.
> But may
Hi Yuan,
Thanks for the review! Basically I’m with Alexander opinion. We’d like to limit
the scope in lookup scenario so we didn’t extend the cache to a generic one.
And as for the metric I think the existing metric definitions are also
applicable for all-cache case.
Cheers,
Qingsheng
> O
Hi Alexander,
Thanks for the review and glad to see we are on the same page! I think you
forgot to cc the dev mailing list so I’m also quoting your reply under this
email.
> We can add 'maxRetryTimes' option into this class
In my opinion the retry logic should be implemented in lookup() ins
Hi Yuan!
> How about abtract the LookupCache to a higher level with a common Cache?
Can you give an example of such upper - level Cache usage? It's not
clear for me currently. I think it's unnecessary to have such high
level abstraction, if nowhere in the code we won't operate with
objects as ins
Hi Qingsheng and devs,
Thanks for your heated discussion and redesign to optmize this feature. I just
have two comments:
1. How about abtract the LookupCache to a higher level with a common Cache? It
will be convenient for devs to use in other place.
2. Does it have any metrics, such as
Hi Alexander and devs,
Thank you very much for the in-depth discussion! As Jark mentioned we were
inspired by Alexander's idea and made a refactor on our design. FLIP-221
[1] has been updated to reflect our design now and we are happy to hear
more suggestions from you!
Compared to the previous de
Hi Jark, Qingsheng and Leonard!
Glad to see that we came to a consensus on almost all points!
However I'm a little confused whether InputFormat is deprecated or
not. Am I right that it will be so in the future, but currently it's
not? Actually I also think that for the first version it's OK to us
Hi Alex,
Thanks for summarizing your points.
In the past week, Qingsheng, Leonard, and I have discussed it several times
and we have totally refactored the design.
I'm glad to say we have reached a consensus on many of your points!
Qingsheng is still working on updating the design docs and maybe
Hi Martijn!
Got it. Therefore, the realization with InputFormat is not considered.
Thanks for clearing that up!
Best regards,
Smirnov Alexander
чт, 12 мая 2022 г. в 14:23, Martijn Visser :
>
> Hi,
>
> With regards to:
>
> > But if there are plans to refactor all connectors to FLIP-27
>
> Yes, FL
Hi,
With regards to:
> But if there are plans to refactor all connectors to FLIP-27
Yes, FLIP-27 is the target for all connectors. The old interfaces will be
deprecated and connectors will either be refactored to use the new ones or
dropped.
The caching should work for connectors that are using
Hi Jark!
Sorry for the late response. I would like to make some comments and
clarify my points.
1) I agree with your first statement. I think we can achieve both
advantages this way: put the Cache interface in flink-table-common,
but have implementations of it in flink-table-runtime. Therefore if
It's great to see the active discussion! I want to share my ideas:
1) implement the cache in framework vs. connectors base
I don't have a strong opinion on this. Both ways should work (e.g., cache
pruning, compatibility).
The framework way can provide more concise interfaces.
The connector base wa
It's a much more complicated activity and lies out of the scope of this
improvement. Because such pushdowns should be done for all ScanTableSource
implementations (not only for Lookup ones).
On Thu, 5 May 2022 at 19:02, Martijn Visser
wrote:
> Hi everyone,
>
> One question regarding "And Alexand
Hi everyone,
One question regarding "And Alexander correctly mentioned that filter
pushdown still is not implemented for jdbc/hive/hbase." -> Would an
alternative solution be to actually implement these filter pushdowns? I can
imagine that there are many more benefits to doing that, outside of loo
Hi everyone!
Thanks for driving such a valuable improvement!
I do think that single cache implementation would be a nice opportunity for
users. And it will break the "FOR SYSTEM_TIME AS OF proc_time" semantics
anyway - doesn't matter how it will be implemented.
Putting myself in the user's shoes
Hi everyone,
I don't have much to chip in, but just wanted to express that I really
appreciate the in-depth discussion on this topic and I hope that others
will join the conversation.
Best regards,
Martijn
On Tue, 3 May 2022 at 10:15, Александр Смирнов wrote:
> Hi Qingsheng, Leonard and Jark,
Hi Qingsheng, Leonard and Jark,
Thanks for your detailed feedback! However, I have questions about
some of your statements (maybe I didn't get something?).
> Caching actually breaks the semantic of "FOR SYSTEM_TIME AS OF proc_time”
I agree that the semantics of "FOR SYSTEM_TIME AS OF proc_time"
Hi Alexander and Arvid,
Thanks for the discussion and sorry for my late response! We had an
internal discussion together with Jark and Leonard and I’d like to
summarize our ideas. Instead of implementing the cache logic in the table
runtime layer or wrapping around the user-provided table function
Thanks for the response, Arvid!
I have few comments on your message.
> but could also live with an easier solution as the first step:
I think that these 2 ways are mutually exclusive (originally proposed
by Qingsheng and mine), because conceptually they follow the same
goal, but implementation d
Hi Qingsheng,
Thanks for driving this; the inconsistency was not satisfying for me.
I second Alexander's idea though but could also live with an easier
solution as the first step: Instead of making caching an implementation
detail of TableFunction X, rather devise a caching layer around X. So the
Hi Qingsheng! My name is Alexander, I'm not a committer yet, but I'd
really like to become one. And this FLIP really interested me.
Actually I have worked on a similar feature in my company’s Flink
fork, and we would like to share our thoughts on this and make code
open source.
I think there is a
Hi devs,
Yuan and I would like to start a discussion about FLIP-221[1], which introduces
an abstraction of lookup table cache and its standard metrics.
Currently each lookup table source should implement their own cache to store
lookup results, and there isn’t a standard of metrics for users a
58 matches
Mail list logo