[ https://issues.apache.org/jira/browse/FLINK-37005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-37005: ----------------------------------- Labels: pull-request-available (was: ) > Make StreamExecDeduplicate ouput insert only where possible > ----------------------------------------------------------- > > Key: FLINK-37005 > URL: https://issues.apache.org/jira/browse/FLINK-37005 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner, Table SQL / Runtime > Affects Versions: 2.0.0 > Reporter: Piotr Nowojski > Assignee: Piotr Nowojski > Priority: Major > Labels: pull-request-available > > According to planner, {{StreamExecDeduplicate}} currently always outputs > updates/retractions, even when this is currently not the case in the runtime. > This can performance problems, for example forcing planner to add > {{SinkUpsertMaterializer}} operator down stream from the deduplication, while > it's actually not necessary. > In this ticket, I would like to both support outputing insert only and > increase number of cases where that's actually the case. > # Proc time keep first row is currently already implemented in such a way > that it outputs inserts only, but this is not actually used/marked in the > planner (planner change only) > # Row time keep first row, could be also implemented to output inserts only, > with an operator that emits deduplication result on watermark, instead of on > each record (planner + runtime change) -- This message was sent by Atlassian Jira (v8.20.10#820010)