Thank you! Fabian HDFS small file problem can be avoid with big checkpoint interval.
Meanwhile, there is potential data lose problem in current BucketingSink. Say we consume data in kafka, when checkpoint is requested, kafka offset is update, but in-progress file in BucketingSink is remained. If flink crushed after that, data in the in-progress file is lost. Am I right? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/