Hello,

We’re trying to use a Parquet file sink to output files in s3.

When running in Streaming mode, it seems that parquet files are flushed and
rolled at each checkpoint. The result is a crazy high number of very small
parquet files which completely defeats the purpose of that format.


Is there a way to build larger output parquet files? Or is it only at the
price of having a very large checkpointing interval?

Thanks for your insights.

Mathieu

Reply via email to