You can also try coalesce as it will avoid full shuffle.

Regards,
Tushar Adeshara
Technical Specialist – Analytics Practice
Cell: +91-81490 04192
Persistent Systems Ltd. | Partners in Innovation | 
www.persistentsys.com<http://www.persistentsys.com/>


________________________________
From: KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
Sent: 13 October 2017 09:35
To: user @spark
Subject: Spark - Partitions

Hi,

I am reading hive query and wiriting the data back into hive after doing some 
transformations.

I have changed setting spark.sql.shuffle.partitions to 2000 and since then job 
completes fast but the main problem is I am getting 2000 files for each 
partition
size of file is 10 MB .

is there a way to get same performance but write lesser number of files ?

I am trying repartition now but would like to know if there are any other 
options.

Thanks,
Asmath
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to