Hi Everyone, Does anyone know what is the best practise of writing parquet file from Spark ?
As Spark app write data to parquet and it shows that under that directory there are heaps of very small parquet file (such as e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only 15KB Should it write each chunk of bigger data size (such as 128 MB) with proper number of files ? Does anyone find out any performance changes when changing data size of each parquet file ? Thanks, Kevin.