Hi Fabian, Kostas, >From the description of this ticket https://issues.apache.org/jira/browse/FLINK-9753, I understand that now my output parquet file with StreamingFileSink will span multiple checkpoints. However, when I tried (as in the here below code snippet) I still see that one "part-X-X" file is created after each checkpoint. Is there any other configuration that I'm missing?
BTW, I have another question regarding StreamingFileSink.BulkFormatBuilder.withBucketCheckInterval(). As per the notes at the end of this page StreamingFileSink <https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/streamfile_sink.html> , buck-enconding can only combined with OnCheckpointRollingPolicy, which rolls on every checkpoint. So setting that CheckInterval makes no difference. So why should we expose that withBucketCheckInterval method? Thanks and best regards, Averell def buildSink[T <: MyBaseRecord](outputPath: String)(implicit ct: ClassTag[T]): StreamingFileSink[T] = { StreamingFileSink.forBulkFormat(new Path(outputPath), ParquetAvroWriters.forReflectRecord(ct.runtimeClass)).asInstanceOf[StreamingFileSink.BulkFormatBuilder[T, String]] .withBucketCheckInterval(5L * 60L * 1000L) .withBucketAssigner(new DateTimeBucketAssigner[T]("yyyy-MM-dd--HH")) .build() } -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/