[ 
https://issues.apache.org/jira/browse/FLINK-36664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903042#comment-17903042
 ] 

Thomas Cooper commented on FLINK-36664:
---------------------------------------

Instead of pasting screen shots could you not have used {{{{code:java}}}} 
blocks? It makes this issue very hard to parse.



> [Window]The window with window_offset will lose data.
> -----------------------------------------------------
>
>                 Key: FLINK-36664
>                 URL: https://issues.apache.org/jira/browse/FLINK-36664
>             Project: Flink
>          Issue Type: Bug
>            Reporter: kaitian
>            Assignee: kaitian
>            Priority: Major
>              Labels: pull-request-available, window
>         Attachments: image-2024-11-06-20-20-34-562.png, 
> image-2024-11-06-20-23-53-184.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When setting the offset for the window, data is lost because the triggering 
> window time is not aligned.
>  
> When SlicingWindowOperator processesWatermarker, it records the time when the 
> next window is triggered (nextTriggerWatermark):
> !image-2024-11-06-20-20-34-562.png!
> The calculation method is to first calculate the begin time of the window 
> where the watermark is located, but the offset passed in during the 
> calculation is 0:
> !image-2024-11-06-20-23-53-184.png!
> That is to say, the window triggering time does not take into account the 
> window offset.
> It is OK if the nextTriggerWatermark is too small. When watermark-offset>0, 
> if (watermark-offset)%window_size > watermark%window_size is satisfied, the 
> nextTriggerWatermark will be too large, where offset and window_size are 
> constants. If the watermark is completely random, it is easy to prove that 
> there is a 50% probability that the nextTriggerWatermark will be too large.
> When the nextTriggerWatermark is too large, a processWatermark should have 
> flushWindowBuffer but was not triggered, resulting in less data in the 
> currently triggered window (assuming it is key1). When the next 
> processWatermark triggers flushWindowBuffer, since the Watermark has moved 
> forward, key1 will be regarded as expired data and the timer will not be 
> registered. That is to say, the subsequent processWatermark will no longer 
> calculate key1, and data will be lost.
>  
> I have writen a UT to prove this bug:
> window_size 3000, offset 1000, tumping window
> window_size 3000, offset 1000. When processWatermark(3000), the normally 
> calculated nextTriggerProgress = 3000 - (3000-1000)%3000 + 3000-1 = 3999, but 
> because the code does not consider the offset, nextTriggerProgress = 3000 - 
> (3000-0)%3000 + 3000-1 = 5999, which is too large.We will lose data.
> !https://intranetproxy.alipay.com/skylark/lark/0/2024/png/101856358/1730893082766-4ea5d356-2b19-436d-be46-093f70e445cd.png!
> This wrong UT will pass. You can see that the data in the UT is lost and will 
> not be calculated in any subsequent processWatermark.
>  
> Repair suggestion:
> pass window_offset when calculating nextTriggerWatermark
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to