Hi devs

Ron Liu, Gen Luo and I would like to start a discussion about FLIP-324:
Introduce Runtime Filter for Flink Batch Jobs[1]

Runtime Filter is a common optimization to improve join performance. It is
designed to dynamically generate filter conditions for certain Join queries
at runtime to reduce the amount of scanned or shuffled data, avoid
unnecessary I/O and network transmission, and speed up the query. Its
working principle is building a filter(e.g. bloom filter) based on the data
on the small table side(build side) first, then pass this filter to the
large table side(probe side) to filter the irrelevant data on it, this can
reduce the data reaching the join and improve performance.

You can find more details in the FLIP-324[1]. Looking forward to your
feedback.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-324%3A+Introduce+Runtime+Filter+for+Flink+Batch+Jobs

Best,
Ron & Gen & Lijie

Reply via email to