Use repartition On 13-Oct-2017 9:35 AM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> wrote:
> Hi, > > I am reading hive query and wiriting the data back into hive after doing > some transformations. > > I have changed setting spark.sql.shuffle.partitions to 2000 and since then > job completes fast but the main problem is I am getting 2000 files for each > partition > size of file is 10 MB . > > is there a way to get same performance but write lesser number of files ? > > I am trying repartition now but would like to know if there are any other > options. > > Thanks, > Asmath >