This is interesting. Thanks for the clarification!
On Mon, Mar 28, 2022 at 4:09 PM Qingsheng Ren <renqs...@gmail.com> wrote: > > Hi, > > The optimization you mentioned is only applicable for the product provided by > Alibaba Cloud. In open-source Apache Flink there isn’t a unique caching > abstraction for all lookup tables, and each connector has there own cache > implementation. For example JDBC uses Guava cache and FileSystem uses > in-memory HashMap, and both of them don’t load all records in dim table into > the cache. > > Best, > > Qingsheng > > > > On Mar 28, 2022, at 12:26, dz902 <dz9...@gmail.com> wrote: > > > > Hi, > > > > I've read some docs > > (https://help.aliyun.com/document_detail/182011.html) stating Flink > > optimization technique using: > > > > - partitionedJoin = 'true' > > - cache = 'ALL' > > - blink.partialAgg.enabled=true > > > > However I could not find any official doc references. Are these > > supported at all? > > > > Also "partitionedJoin" seemed to have the effect of shuffling input by > > joining key so they can fit into memory. I read this > > (https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html) > > and believes this is already a default behavior of Flink. > > > > Is this optimization not needed even for huge input tables? > > > > Thanks, > > Dai >