kaitian created FLINK-36663:
-------------------------------

             Summary: [Window]The first processWatermark() after Sliding Window 
restore may have extra expired data.
                 Key: FLINK-36663
                 URL: https://issues.apache.org/jira/browse/FLINK-36663
             Project: Flink
          Issue Type: Bug
            Reporter: kaitian
         Attachments: image-2024-11-06-19-43-58-569.png, 
image-2024-11-06-19-47-37-225.png, image-2024-11-06-19-50-35-362.png

The root cause is that the currentWatermark of TimerService is not restored 
after restore.

!image-2024-11-06-19-43-58-569.png!

This will cause problems when using the sliding window, such as hopping window. 
Because when flushing WindowBuffer, it is necessary to determine whether the 
current <key, slice_end> is expired. Here, the currentWatermark in the 
TimerService is used, and the currentWatermark in the TimerService will only be 
updated to the <currentWatermark saved in the window> when advance.

!image-2024-11-06-19-47-37-225.png!

This will cause that after restore and before the first advance, if 
processElement(<key, slice_end> ), the slice_end has been triggered, but 
because this slice is included in other untriggered windows, the data <key, 
slice_end> will be written to the windowBuffer. (At this time, <the 
currentWatermark in the window> is used to determine whether the data is 
expired, and <the currentWatermark in the window> is restored):

!image-2024-11-06-19-50-35-362.png!

However, because <the currentWatermark in timeService> is used when 
flushWindowBuffer, it is determined that the slice has not expired and the 
timer is registered.
This will cause an expired slice_end to be registered, resulting in an expired 
window being triggered and output extra data.

 

 

I have writen a UT to prove this problem:

!https://aone.alibaba-inc.com/v2/api/workitem/adapter/file/url?fileIdentifier=workitem%2Falibaba%2F875140%2F1730810319419%E6%88%AA%E5%B1%8F2024-11-05%2020.38.19.png!

Triggering restore will have different results. You can remove the restore part 
in the red box and the test will pass.

use hopping window, window size = 3000, we have window range:[-1000,2000), [0, 
3000]

processWatermark (currentWatermark = 2001)

processElement:     the expired record <"key2", 1, fromEpochMillis(1500L)>

if no restore after processWatermark (currentWatermark = 2001):

the expired record will not output in window range [-1000,2000).

 

if restore after processWatermark (currentWatermark = 2001):

the expired record will not output in window range [-1000,2000).

 

 

Repair suggestions:

While restoring the window currentWatermark, restore the currentWatermark in 
the timeService

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to