Spark - Partitions

KhajaAsmath Mohammed Thu, 12 Oct 2017 21:05:26 -0700

Hi,

I am reading hive query and wiriting the data back into hive after doing
some transformations.


I have changed setting spark.sql.shuffle.partitions to 2000 and since then
job completes fast but the main problem is I am getting 2000 files for each
partition
size of file is 10 MB .

is there a way to get same performance but write lesser number of files ?

I am trying repartition now but would like to know if there are any other
options.

Thanks,
Asmath

Spark - Partitions

Reply via email to