Hi Everyone,
Does anyone know what is the best practise of writing parquet file from
Spark ?

As Spark app write data to parquet and it shows that under that directory
there are heaps of very small parquet file (such as
e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only
15KB

Should it write each chunk of  bigger data size (such as 128 MB) with
proper number of files ?

Does anyone find out any performance changes when changing data size of
each parquet file ?

Thanks,
Kevin.

Reply via email to