Sujeet: It's gcs not s3. It might be boost performance, but since my table
is partitioned copying is not straight forward.

With Hive 2.1 files are moving in parallel. But partitioned are still moved
one at a time.

For example if there are 10 partitions and each partition has 100 files
then within a partition 100 files are moved in parallel. But at a time only
one partition is moved.

Is there a way to move partitions in parallel?




On Mon, May 22, 2017 at 11:45 AM, Sujeet Pardeshi <sujeet.parde...@sas.com>
wrote:

> You should create the partitions in HDFS locally. Then move these
> partitions through a copy command on your external location (is it a S3
> bucket?). You will see a massive gain in performance.
>
>
>
> Regards,
>
> Sujeet Singh Pardeshi
>
> Software Specialist
>
> SAS Research and Development (India) Pvt. Ltd.
>
> Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra,
> 411 013
> *o*ff: +91-20-49118448
> [image: Description: untitled]
>
>  *"When the solution is simple, God is answering…" *
>
>
>
> *From:* Rishi Aggarwal [mailto:ri...@hike.in]
> *Sent:* 21 May 2017 AM 11:44
> *To:* user@hive.apache.org
> *Subject:* How to perform hive moveTask in parallel?
>
>
>
> *EXTERNAL*
>
> I am running a insert overwrite query on an external table which is
> partitioned (192 partitions).
>
> On doing explain I see there are mainly two stage.
>
> 1.      MR stage (8 mappers and 10 reducers)
>
> 2.      Move Stage
>
> MR stage is completing in 15-20 mins.
>
> Move stage is taking about *3hours*.
>
> On looking further I found, reducers are writing to a temporary location
> then in move stage it's moved to target location. Move from temp to target
> is happening sequentially. And since I have 192 partitions and 10 reducers.
> It's taking 3 hours to move all the files.
>
> Is there a way to do move in parallel?
>
> Hive Version: 1.2.1
>

Reply via email to