You should create the partitions in HDFS locally. Then move these partitions through a copy command on your external location (is it a S3 bucket?). You will see a massive gain in performance.
Regards, Sujeet Singh Pardeshi Software Specialist SAS Research and Development (India) Pvt. Ltd. Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra, 411 013 off: +91-20-49118448 [Description: untitled] "When the solution is simple, God is answering…" From: Rishi Aggarwal [mailto:ri...@hike.in] Sent: 21 May 2017 AM 11:44 To: user@hive.apache.org Subject: How to perform hive moveTask in parallel? EXTERNAL I am running a insert overwrite query on an external table which is partitioned (192 partitions). On doing explain I see there are mainly two stage. 1. MR stage (8 mappers and 10 reducers) 2. Move Stage MR stage is completing in 15-20 mins. Move stage is taking about 3hours. On looking further I found, reducers are writing to a temporary location then in move stage it's moved to target location. Move from temp to target is happening sequentially. And since I have 192 partitions and 10 reducers. It's taking 3 hours to move all the files. Is there a way to do move in parallel? Hive Version: 1.2.1