Gustavo de Morais created FLINK-37844:
-----------------------------------------

             Summary: FLIP-516 Optimization: Push down projections for 
StreamingMultiJoinOperator
                 Key: FLINK-37844
                 URL: https://issues.apache.org/jira/browse/FLINK-37844
             Project: Flink
          Issue Type: Improvement
            Reporter: Gustavo de Morais


We're currently adding support for a StreamingMultiJoinOperator which is able 
to join N inputs. There are multiple minor optimizations we might be able to do 
that weren't so easy to do with multiple chained binary joins. One of them is 
materializing into state only attributes that are either joined in any of the N 
- 1 join conditions or are projected in the final output. We'd have to do the 
following:

 
 * We already have the information of used fields for each input in 
joinAttributeMap and can either pass that to the operator or add a new method 
to the join extractor.
 * The MultiJoin will contain the list of fields to be projected. We might have 
to adapt and expose that as a map per inputid when creating the FlinkMultiJoin.
 * When adding a record to state, we remove attributes that will not be used in 
join conditions or projected.
 * If we use null for these attributes, we don't have to adapt the logic. If we 
recreate rows with a smaller arity, multiple places have to be adjusted so that 
all our index-based logic is updated and correct.

 

Obs: this was a even more significant problem for binary joins, since we 
materialized all attributes for all intermediate results. However, it's also 
relevant here. I plan to measure impacts for each of the optimizations before 
adding them [based on a 
benchmark|https://github.com/apache/flink-benchmarks?tab=readme-ov-file#general-remarks],
 and we'll first merge the operator. However, I'll be documenting the 
optimizations with tickets so we track them here. This ticket arose from a 
discussion with [~roman] 
[here.|https://github.com/apache/flink/pull/26313#discussion_r2105917437]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to