dynamic coalesce to pick file size

Maurin Lenglart Tue, 26 Jul 2016 12:03:07 -0700

Hi,
I am doing a Sql query that return a Dataframe. Then I am writing the result of 
the query using “df.write”, but the result get written in a lot of different 
small files (~100 of 200 ko). So now I am doing a “.coalesce(2)” before the 
write.
But the number “2” that I picked is static, is there have a way of dynamically 
picking the number depending of the file size wanted? (around 256mb would be 
perfect)


I am running spark 1.6 on CDH using yarn, the files are written in parquet 
format.

Thanks

dynamic coalesce to pick file size

Reply via email to