Hi Febian, Finally I have time to read the code, and it is brilliant it does provide exactly once guarantee。 However I still suggest to add the function that can close a file when checkpoint made. I noticed that there is an enhancement https://issues.apache.org/jira/browse/FLINK-9138 which can close file on a time based rollover, but it is not very accurate. My user case is we read data from message queue, write to HDFS, and our ETL team will use the data in HDFS. In the case, ETL need to know if all data is ready to be read accurately, so we use a counter to count how many data has been wrote, if the counter is equal to the number we received, we think HDFS file is ready. We send the counter message in a custom sink so ETL can know how many data has been wrote, but if use current BucketingSink, even through HDFS file is flushed, ETL may still cannot read the data. If we can close file during checkpoint, then the result is accurately. And for the HDFS small file problem, it can be controller by use bigger checkpoint interval.
I did take the BuckingSink code and adapt our case, but if it can be done in Flink, we can save to time to maintain our own branch. Thanks! Jeffrey -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/