Piotr Nowojski created FLINK-37005: -------------------------------------- Summary: Make StreamExecDeduplicate ouput insert only where possible Key: FLINK-37005 URL: https://issues.apache.org/jira/browse/FLINK-37005 Project: Flink Issue Type: Improvement Components: Table SQL / Planner, Table SQL / Runtime Affects Versions: 2.0.0 Reporter: Piotr Nowojski Assignee: Piotr Nowojski
According to planner, {{StreamExecDeduplicate}} currently always outputs updates/retractions, even when this is currently not the case in the runtime. This can performance problems, for example forcing planner to add {{SinkUpsertMaterializer}} operator down stream from the deduplication, while it's actually not necessary. In this ticket, I would like to both support outputing insert only and increase number of cases where that's actually the case. # Proc time keep first row is currently already implemented in such a way that it outputs inserts only, but this is not actually used/marked in the planner (planner change only) # Row time keep first row, could be also implemented to output inserts only, with an operator that emits deduplication result on watermark, instead of on each record (planner + runtime change) -- This message was sent by Atlassian Jira (v8.20.10#820010)