[ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14204:
------------------------------------
    Attachment: HIVE-14204.4.patch

sync access to MSC would impact perf when compared with initial improvements. 
But still the diff compared without the patch is significant. Attaching the 
patch which uses SynchronizedMetaStoreClient(). Moved this class outside of 
DbTxnManager. 

Without Patch:
==============
INSERT INTO TABLE  web_sales_test partition(ws_sold_date_sk) select * from 
tpcds_bin_partitioned_orc_200.web_sales;
Time taken to load dynamic partitions: *354.176 seconds*

With Patch (004):
================
INSERT INTO TABLE  web_sales_test partition(ws_sold_date_sk) select * from 
tpcds_bin_partitioned_orc_200.web_sales;
Time taken to load dynamic partitions: *122.517 seconds*



> Optimize loading dynamic partitions 
> ------------------------------------
>
>                 Key: HIVE-14204
>                 URL: https://issues.apache.org/jira/browse/HIVE-14204
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to