[jira] [Commented] (FLINK-18996) Avoid disorder for time interval join

Chalres Tan (Jira) Tue, 21 Mar 2023 20:46:53 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703468#comment-17703468
 ]


Chalres Tan commented on FLINK-18996:
-------------------------------------

+1 [~zicat]. I was pointed to [this design doc|http://goo.gl/VW5Gpd] and 
https://issues.apache.org/jira/browse/FLINK-6233.

In the design doc they mention "Considering that, performing cache cleaning too 
frequently may affect efficiency. We add a default delay to postpone this 
process, i.e., {_}minCleanUpInterval = (LSize + RSize) / 2{_}."

It seems like the minCleanUpInterval is there to prevent the frequency of 
cleanup to possibly save compute. I agree with you that delaying cleanup will 
cause issues for downstream operators and the default should be that 
minCleanUpInterval = 0. If we cannot remove or change the minCleanUpInterval 
default value to 0, perhaps we can expose an option to the user to override 
this value.

> Avoid disorder for time interval join
> -------------------------------------
>
>                 Key: FLINK-18996
>                 URL: https://issues.apache.org/jira/browse/FLINK-18996
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>            Reporter: Benchao Li
>            Priority: Major
>              Labels: auto-deprioritized-critical, auto-deprioritized-major
>             Fix For: 1.17.0
>
>
> Currently, the time interval join will produce data with rowtime later than 
> watermark. If we use the rowtime again in downstream, e.t. window 
> aggregation, we'll lose some data.
>  
> reported from user-zh: 
> [http://apache-flink.147419.n8.nabble.com/Re-flink-interval-join-tc4458.html#none]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-18996) Avoid disorder for time interval join

Reply via email to