Gustavo de Morais created FLINK-37844: -----------------------------------------
Summary: FLIP-516 Optimization: Push down projections for StreamingMultiJoinOperator Key: FLINK-37844 URL: https://issues.apache.org/jira/browse/FLINK-37844 Project: Flink Issue Type: Improvement Reporter: Gustavo de Morais We're currently adding support for a StreamingMultiJoinOperator which is able to join N inputs. There are multiple minor optimizations we might be able to do that weren't so easy to do with multiple chained binary joins. One of them is materializing into state only attributes that are either joined in any of the N - 1 join conditions or are projected in the final output. We'd have to do the following: * We already have the information of used fields for each input in joinAttributeMap and can either pass that to the operator or add a new method to the join extractor. * The MultiJoin will contain the list of fields to be projected. We might have to adapt and expose that as a map per inputid when creating the FlinkMultiJoin. * When adding a record to state, we remove attributes that will not be used in join conditions or projected. * If we use null for these attributes, we don't have to adapt the logic. If we recreate rows with a smaller arity, multiple places have to be adjusted so that all our index-based logic is updated and correct. Obs: this was a even more significant problem for binary joins, since we materialized all attributes for all intermediate results. However, it's also relevant here. I plan to measure impacts for each of the optimizations before adding them [based on a benchmark|https://github.com/apache/flink-benchmarks?tab=readme-ov-file#general-remarks], and we'll first merge the operator. However, I'll be documenting the optimizations with tickets so we track them here. This ticket arose from a discussion with [~roman] [here.|https://github.com/apache/flink/pull/26313#discussion_r2105917437] -- This message was sent by Atlassian Jira (v8.20.10#820010)