[
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated HIVE-14204:
------------------------------------
Attachment: HIVE-14204.4.patch
sync access to MSC would impact perf when compared with initial improvements.
But still the diff compared without the patch is significant. Attaching the
patch which uses SynchronizedMetaStoreClient(). Moved this class outside of
DbTxnManager.
Without Patch:
==============
INSERT INTO TABLE web_sales_test partition(ws_sold_date_sk) select * from
tpcds_bin_partitioned_orc_200.web_sales;
Time taken to load dynamic partitions: *354.176 seconds*
With Patch (004):
================
INSERT INTO TABLE web_sales_test partition(ws_sold_date_sk) select * from
tpcds_bin_partitioned_orc_200.web_sales;
Time taken to load dynamic partitions: *122.517 seconds*
> Optimize loading dynamic partitions
> ------------------------------------
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch,
> HIVE-14204.4.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned
> dataset in driver side. E.g simple dynamic partitioned load as follows takes
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)