[ 
https://issues.apache.org/jira/browse/FLINK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819016#comment-16819016
 ] 

Biao Liu commented on FLINK-12172:
----------------------------------

Aha, I know what you meant now. However you don't need to worry about this. 
Checkpoint (savepoint) mechanism would take care of it. If pending files are 
abandoned, after restoring checkpoint would make sure to revert to the correct 
state. The records abandoned in pending files would be reproduced into a new 
file (in your case). 

There is also an incorrect description in your question, "they contain part of 
the computation for a window". If a record is written into a pending file, that 
means it has been purged from the window, it doesn't belong to this window 
anymore. Further more if the record belongs to several windows, the record 
written by BucketingSink and the record still in other window, they are 
different copies. So there can't be a record belongs an unfinished window and 
at the same time it has been processed by a BucketingSink.

You could learn more about window in document [1]. And it's very useful to read 
the source code of BucketingSink. It would help you understanding checkpoint 
better.

[1] 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/windows.html]

 

> Flink fails to close pending BucketingSink
> ------------------------------------------
>
>                 Key: FLINK-12172
>                 URL: https://issues.apache.org/jira/browse/FLINK-12172
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem
>    Affects Versions: 1.7.2
>            Reporter:  Mario Georgiev
>            Priority: Major
>
> Hello,
> The problem is if you have a BucketingSink, the following case may occur :
> Let's say you have a 2019-04-12–12 bucket created with several files inside 
> which are pending/finished
>  You create a savepoint and shut down the job
>  After an hour for instance you start the job from the savepoint and a new 
> bucket is created, 2019-04-16 for instance. 
>  The problem is that the .pending ones from the old buckets seem to never be 
> moved to finished state if there is a new hourly bucket created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to