[ https://issues.apache.org/jira/browse/HIVE-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan updated HIVE-15240: ------------------------------------ Attachment: HIVE-15240.1.patch As part of "StatsTask -> updateQuickStats" itself, statistics like number of files & total size are updated in the partition parameters. So no need to update these fast-stats in "StatsTask -> aggregateStats -> db.alterPartitions". {noformat} drop web_returns table and populate it back with the data. .. .. insert into table web_returns partition(wr_returned_date_sk) select * from tpcds_bin_partitioned_200.web_returns {noformat} Total number of partitions were around 2184 and entire dataset was in S3. ||Run ID||Without Patch (secs)||With Patch (secs)|| |Run 1|1025.526|688.752| |Run 2|1031.841|665.139| |Run 3|1017.168|669.171| *~35%* reduction in response time is observed with the patch. > Updating/Altering stats in metastore can be expensive in S3 > ----------------------------------------------------------- > > Key: HIVE-15240 > URL: https://issues.apache.org/jira/browse/HIVE-15240 > Project: Hive > Issue Type: Improvement > Components: Metastore > Reporter: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-15240.1.patch > > > https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L630 > https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L367 > If there are 100 partitions, it iterates every partition to determine its > location taking up more than good amount of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)