"Flink will only commit the kafka offsets when the data has been saved to S3" -> no, you can check the BucketingSink code, and it would mean BucketingSink depends on Kafka which is not reasonable.
Flink stores checkpoint in disk of each worker, not Kafka. (KafkaStream, the other streaming API provided by Kafka, stores checkpoint back to Kafka) So, bucket size doesn't affect the commit frequency. Best, Sendoh -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/