[ 
https://issues.apache.org/jira/browse/HIVE-24776?focusedWorklogId=693823&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-693823
 ]

ASF GitHub Bot logged work on HIVE-24776:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Dec/21 10:04
            Start Date: 10/Dec/21 10:04
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #2636:
URL: https://github.com/apache/hive/pull/2636#discussion_r766527736



##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
##########
@@ -346,11 +346,12 @@ public void alterTable(RawStore msdb, Warehouse wh, 
String catName, String dbnam
             }
           }
           Deadline.checkTimeout();
+          Table table = msdb.getTable(catName, newDbName, newTblName);
           for (Entry<Partition, ColumnStatistics> partColStats : 
columnStatsNeedUpdated.entries()) {
             ColumnStatistics newPartColStats = partColStats.getValue();
             newPartColStats.getStatsDesc().setDbName(newDbName);
             newPartColStats.getStatsDesc().setTableName(newTblName);
-            msdb.updatePartitionColumnStatistics(newPartColStats, 
partColStats.getKey().getValues(),
+            msdb.updatePartitionColumnStatistics(table, newPartColStats, 
partColStats.getKey().getValues(),

Review comment:
       looking at the above code - I'm wondering why we need this at all; I 
believe `alterPartitions` clears the stat data - but I've not seen it 
explicitly - and this added logic here adds it back after that was done
   
   * old values are really removed ?
   * can't we simply retain the old stat values - because at the end of the day 
that's what happens here...or I've missed something? - doing this would reduce 
the number of calls drastically; since we would simply retain things
   
   It also seems like the `alterPartitions` / `alterTable` is also killing the 
basic stat state - after a rename I don't think we must do that....
   

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -9687,31 +9687,35 @@ private void writeMPartitionColumnStatistics(Table 
table, Partition partition,
       List<ColumnStatisticsObj> statsObjs = colStats.getStatsObj();
       ColumnStatisticsDesc statsDesc = colStats.getStatsDesc();
       String catName = statsDesc.isSetCatName() ? statsDesc.getCatName() : 
getDefaultCatalog(conf);
-      MTable mTable = ensureGetMTable(catName, statsDesc.getDbName(), 
statsDesc.getTableName());
-      Table table = convertToTable(mTable);
-      Partition partition = convertToPart(getMPartition(
-          catName, statsDesc.getDbName(), statsDesc.getTableName(), partVals, 
mTable), false);
-      List<String> colNames = new ArrayList<>();
+      MTable mTable = null;
+      if(table == null) {

Review comment:
       let's not play with `null` -s and alternate code paths
   you decided to change the method signature and added `table` ; fill it out 
everywhere or remove the parameter.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 693823)
    Time Spent: 40m  (was: 0.5h)

> Reduce HMS DB calls during stats updates
> ----------------------------------------
>
>                 Key: HIVE-24776
>                 URL: https://issues.apache.org/jira/browse/HIVE-24776
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Harshit Gupta
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
>  When adding large number of partitions (100s/1000s) in a table, it ends up 
> making lots of getTable calls which are not needed.
> Lines mentioned below may vary slightly in apache-master. 
> {noformat}
>       at 
> org.datanucleus.api.jdo.JDOPersistenceManager.jdoRetrieve(JDOPersistenceManager.java:620)
>       at 
> org.datanucleus.api.jdo.JDOPersistenceManager.retrieve(JDOPersistenceManager.java:637)
>       at 
> org.datanucleus.api.jdo.JDOPersistenceManager.retrieve(JDOPersistenceManager.java:646)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2112)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2150)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.ensureGetMTable(ObjectStore.java:4578)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.ensureGetTable(ObjectStore.java:4588)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatistics(ObjectStore.java:9264)
>       at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
>       at com.sun.proxy.$Proxy27.updatePartitionColumnStatistics(Unknown 
> Source)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartitonColStatsInternal(HiveMetaStore.java:6679)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:8655)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:8592)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>       at com.sun.proxy.$Proxy28.set_aggr_stats_for(Unknown Source)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:19060)
>       at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:19044)
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to