[ 
https://issues.apache.org/jira/browse/FLINK-37005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-37005:
-----------------------------------
    Labels: pull-request-available  (was: )

> Make StreamExecDeduplicate ouput insert only where possible
> -----------------------------------------------------------
>
>                 Key: FLINK-37005
>                 URL: https://issues.apache.org/jira/browse/FLINK-37005
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner, Table SQL / Runtime
>    Affects Versions: 2.0.0
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
>            Priority: Major
>              Labels: pull-request-available
>
> According to planner, {{StreamExecDeduplicate}} currently always outputs 
> updates/retractions, even when this is currently not the case in the runtime. 
> This can performance problems, for example forcing planner to add 
> {{SinkUpsertMaterializer}} operator down stream from the deduplication, while 
> it's actually not necessary. 
> In this ticket, I would like to both support outputing insert only and 
> increase number of cases where that's actually the case.
> # Proc time keep first row is currently already implemented in such a way 
> that it outputs inserts only, but this is not actually used/marked in the 
> planner (planner change only)
> # Row time keep first row, could be also implemented to output inserts only, 
> with an operator that emits deduplication result on watermark, instead of on 
> each record (planner + runtime change)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to