You should create the partitions in HDFS locally. Then move these partitions 
through a copy command on your external location (is it a S3 bucket?). You will 
see a massive gain in performance.


Regards,

Sujeet Singh Pardeshi

Software Specialist

SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 411 
013
off: +91-20-49118448
[Description: untitled]
 "When the solution is simple, God is answering…"

From: Rishi Aggarwal [mailto:ri...@hike.in]
Sent: 21 May 2017 AM 11:44
To: user@hive.apache.org
Subject: How to perform hive moveTask in parallel?


EXTERNAL

I am running a insert overwrite query on an external table which is partitioned 
(192 partitions).

On doing explain I see there are mainly two stage.
1.      MR stage (8 mappers and 10 reducers)
2.      Move Stage

MR stage is completing in 15-20 mins.

Move stage is taking about 3hours.

On looking further I found, reducers are writing to a temporary location then 
in move stage it's moved to target location. Move from temp to target is 
happening sequentially. And since I have 192 partitions and 10 reducers. It's 
taking 3 hours to move all the files.

Is there a way to do move in parallel?

Hive Version: 1.2.1

Reply via email to