[ 
https://issues.apache.org/jira/browse/HIVE-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15240:
------------------------------------
    Attachment: HIVE-15240.1.patch

As part of "StatsTask -> updateQuickStats" itself, statistics like number of 
files & total size are updated in the partition parameters. So no need to 
update these fast-stats in "StatsTask -> aggregateStats -> db.alterPartitions".

{noformat}
drop web_returns table and populate it back with the data.
..
..
insert into table web_returns partition(wr_returned_date_sk) select * from 
tpcds_bin_partitioned_200.web_returns
{noformat}

Total number of partitions were around 2184 and entire dataset was in S3.

||Run ID||Without Patch (secs)||With Patch (secs)||
|Run 1|1025.526|688.752|
|Run 2|1031.841|665.139|
|Run 3|1017.168|669.171|

*~35%* reduction in response time is observed with the patch.

> Updating/Altering stats in metastore can be expensive in S3
> -----------------------------------------------------------
>
>                 Key: HIVE-15240
>                 URL: https://issues.apache.org/jira/browse/HIVE-15240
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-15240.1.patch
>
>
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L630
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L367
> If there are 100 partitions, it iterates every partition to determine its 
> location taking up more than good amount of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to