Piotr Nowojski created FLINK-37005:
--------------------------------------

             Summary: Make StreamExecDeduplicate ouput insert only where 
possible
                 Key: FLINK-37005
                 URL: https://issues.apache.org/jira/browse/FLINK-37005
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / Planner, Table SQL / Runtime
    Affects Versions: 2.0.0
            Reporter: Piotr Nowojski
            Assignee: Piotr Nowojski


According to planner, {{StreamExecDeduplicate}} currently always outputs 
updates/retractions, even when this is currently not the case in the runtime. 
This can performance problems, for example forcing planner to add 
{{SinkUpsertMaterializer}} operator down stream from the deduplication, while 
it's actually not necessary. 

In this ticket, I would like to both support outputing insert only and increase 
number of cases where that's actually the case.

# Proc time keep first row is currently already implemented in such a way that 
it outputs inserts only, but this is not actually used/marked in the planner 
(planner change only)
# Row time keep first row, could be also implemented to output inserts only, 
with an operator that emits deduplication result on watermark, instead of on 
each record (planner + runtime change)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to