Hi, I am reading hive query and wiriting the data back into hive after doing some transformations.
I have changed setting spark.sql.shuffle.partitions to 2000 and since then job completes fast but the main problem is I am getting 2000 files for each partition size of file is 10 MB . is there a way to get same performance but write lesser number of files ? I am trying repartition now but would like to know if there are any other options. Thanks, Asmath