Re: Where is the "Partitioned All Cache" doc?

dz902 Mon, 28 Mar 2022 03:14:56 -0700

This is interesting. Thanks for the clarification!


On Mon, Mar 28, 2022 at 4:09 PM Qingsheng Ren <renqs...@gmail.com> wrote:
>
> Hi,
>
> The optimization you mentioned is only applicable for the product provided by 
> Alibaba Cloud. In open-source Apache Flink there isn’t a unique caching 
> abstraction for all lookup tables, and each connector has there own cache 
> implementation. For example JDBC uses Guava cache and FileSystem uses 
> in-memory HashMap, and both of them don’t load all records in dim table into 
> the cache.
>
> Best,
>
> Qingsheng
>
>
> > On Mar 28, 2022, at 12:26, dz902 <dz9...@gmail.com> wrote:
> >
> > Hi,
> >
> > I've read some docs
> > (https://help.aliyun.com/document_detail/182011.html) stating Flink
> > optimization technique using:
> >
> > - partitionedJoin = 'true'
> > - cache = 'ALL'
> > - blink.partialAgg.enabled=true
> >
> > However I could not find any official doc references. Are these
> > supported at all?
> >
> > Also "partitionedJoin" seemed to have the effect of shuffling input by
> > joining key so they can fit into memory. I read this
> > (https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html)
> > and believes this is already a default behavior of Flink.
> >
> > Is this optimization not needed even for huge input tables?
> >
> > Thanks,
> > Dai
>

Re: Where is the "Partitioned All Cache" doc?

Reply via email to