Thanks Lijie start this discussion. Runtime Filter is a common optimization to improve the join performance that has been adopted by many computing engines such as Spark, Doris, etc... Flink is a streaming batch computing engine, and we are continuously optimizing the performance of batches. Runtime filter is a general performance optimization technique that can improve the performance of Flink batch jobs, so we are introducing it on batch as well.
Looking forward to all feedback. Best, Ron Lijie Wang <wangdachui9...@gmail.com> 于2023年6月14日周三 17:17写道: > Hi devs > > Ron Liu, Gen Luo and I would like to start a discussion about FLIP-324: > Introduce Runtime Filter for Flink Batch Jobs[1] > > Runtime Filter is a common optimization to improve join performance. It is > designed to dynamically generate filter conditions for certain Join queries > at runtime to reduce the amount of scanned or shuffled data, avoid > unnecessary I/O and network transmission, and speed up the query. Its > working principle is building a filter(e.g. bloom filter) based on the data > on the small table side(build side) first, then pass this filter to the > large table side(probe side) to filter the irrelevant data on it, this can > reduce the data reaching the join and improve performance. > > You can find more details in the FLIP-324[1]. Looking forward to your > feedback. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs > > Best, > Ron & Gen & Lijie >