Hi to all,

I was reading about optimal Parquet file size and HDFS block size.
The ideal situation for Parquet is when its block size (and thus the
maximum size of each row group) is equal to the HDFS block size. The
default behaviour of Flink is that the output file's size depends on the
output parallelism and thus I don't know how to achieve that.
Is that feasible?

Best,
Flavio

Reply via email to