Writing Parquet files with Flink

Flavio Pompermaier Thu, 28 Jan 2016 07:13:45 -0800

Hi to all,

I was reading about optimal Parquet file size and HDFS block size.
The ideal situation for Parquet is when its block size (and thus the
maximum size of each row group) is equal to the HDFS block size. The
default behaviour of Flink is that the output file's size depends on the
output parallelism and thus I don't know how to achieve that.
Is that feasible?


Best,
Flavio

Writing Parquet files with Flink

Reply via email to