Hi,

I am reading hive query and wiriting the data back into hive after doing
some transformations.

I have changed setting spark.sql.shuffle.partitions to 2000 and since then
job completes fast but the main problem is I am getting 2000 files for each
partition
size of file is 10 MB .

is there a way to get same performance but write lesser number of files ?

I am trying repartition now but would like to know if there are any other
options.

Thanks,
Asmath

Reply via email to