The behavior of BucketingSink is not exactly we want. If I understood correctly, when checkpoint requested, BucketingSink will flush writer to make sure data not loss, but will not close file, nor roll new file after checkpoint. In the case of HDFS, if file length is not updated to name node(through close file or update file length specifically), MR or other data analysis tool will not read new data. This is not we desired. I also want to open new file for each checkpoint period to make sure HDFS file is persistent, because we met some bugs in flush/append hdfs file user case.
Is there anyway to let BucketingSink roll file on each checkpoint? Thanks in advance. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/