Hi,

Have you looked into File Compaction (which is supported in the Table/SQL
side)? [1]

Best regards,

Martijn

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#file-compaction

On Mon, 27 Dec 2021 at 16:10, Deepak Sharma <deepakmc...@gmail.com> wrote:

> I would suggest taking a look at CheckpointRollingPolicy.
> You need to extend it and override the default behviors in your FileSink.
>
> HTH.
>
> Thanks
> Deepak
>
> On Mon, Dec 27, 2021 at 8:13 PM Mathieu D <matd...@gmail.com> wrote:
>
>> Hello,
>>
>> We’re trying to use a Parquet file sink to output files in s3.
>>
>> When running in Streaming mode, it seems that parquet files are flushed
>> and rolled at each checkpoint. The result is a crazy high number of very
>> small parquet files which completely defeats the purpose of that format.
>>
>>
>> Is there a way to build larger output parquet files? Or is it only at the
>> price of having a very large checkpointing interval?
>>
>> Thanks for your insights.
>>
>> Mathieu
>>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>

Reply via email to