Hi Vinay, You are correct when saying that the bulk formats only support onCheckpointRollingPolicy.
The reason for this has to do with the fact that currently Flink relies on the Hadoop writer for Parquet. Bulk formats keep important details about how they write the actual data (such as compression schemes, offsets, etc) in metadata and they write this metadata with the file (e.g. parquet writes them as a footer). The hadoop writer gives no access to these metadata. Given this, there is no way for flink to be able to checkpoint a part file securely without closing it. The solution would be to write our own writer and not go through the hadoop one, but there are no concrete plans for this, as far as I know. Cheers, Kostas On Tue, Oct 29, 2019 at 12:57 PM Vinay Patil <vinay18.pa...@gmail.com> wrote: > > Hi, > > I am not able to roll the files based on file size as the bulkFormat has > onCheckpointRollingPolicy. > > One way is to write CustomStreamingFileSink and provide RollingPolicy like > RowFormatBuilder. Is this the correct way to go ahead ? > > Another way is to write ParquetEncoder and use RowFormatBuilder. > > P.S. Curious to know Why was the RollingPolicy not exposed in case of > BulkFormat ? > > Regards, > Vinay Patil