[ https://issues.apache.org/jira/browse/FLINK-36664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903042#comment-17903042 ]
Thomas Cooper commented on FLINK-36664: --------------------------------------- Instead of pasting screen shots could you not have used {{{{code:java}}}} blocks? It makes this issue very hard to parse. > [Window]The window with window_offset will lose data. > ----------------------------------------------------- > > Key: FLINK-36664 > URL: https://issues.apache.org/jira/browse/FLINK-36664 > Project: Flink > Issue Type: Bug > Reporter: kaitian > Assignee: kaitian > Priority: Major > Labels: pull-request-available, window > Attachments: image-2024-11-06-20-20-34-562.png, > image-2024-11-06-20-23-53-184.png > > Original Estimate: 1h > Remaining Estimate: 1h > > When setting the offset for the window, data is lost because the triggering > window time is not aligned. > > When SlicingWindowOperator processesWatermarker, it records the time when the > next window is triggered (nextTriggerWatermark): > !image-2024-11-06-20-20-34-562.png! > The calculation method is to first calculate the begin time of the window > where the watermark is located, but the offset passed in during the > calculation is 0: > !image-2024-11-06-20-23-53-184.png! > That is to say, the window triggering time does not take into account the > window offset. > It is OK if the nextTriggerWatermark is too small. When watermark-offset>0, > if (watermark-offset)%window_size > watermark%window_size is satisfied, the > nextTriggerWatermark will be too large, where offset and window_size are > constants. If the watermark is completely random, it is easy to prove that > there is a 50% probability that the nextTriggerWatermark will be too large. > When the nextTriggerWatermark is too large, a processWatermark should have > flushWindowBuffer but was not triggered, resulting in less data in the > currently triggered window (assuming it is key1). When the next > processWatermark triggers flushWindowBuffer, since the Watermark has moved > forward, key1 will be regarded as expired data and the timer will not be > registered. That is to say, the subsequent processWatermark will no longer > calculate key1, and data will be lost. > > I have writen a UT to prove this bug: > window_size 3000, offset 1000, tumping window > window_size 3000, offset 1000. When processWatermark(3000), the normally > calculated nextTriggerProgress = 3000 - (3000-1000)%3000 + 3000-1 = 3999, but > because the code does not consider the offset, nextTriggerProgress = 3000 - > (3000-0)%3000 + 3000-1 = 5999, which is too large.We will lose data. > !https://intranetproxy.alipay.com/skylark/lark/0/2024/png/101856358/1730893082766-4ea5d356-2b19-436d-be46-093f70e445cd.png! > This wrong UT will pass. You can see that the data in the UT is lost and will > not be calculated in any subsequent processWatermark. > > Repair suggestion: > pass window_offset when calculating nextTriggerWatermark > -- This message was sent by Atlassian Jira (v8.20.10#820010)