[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

ASF GitHub Bot (Jira) Thu, 27 May 2021 00:20:06 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=602742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-602742
 ]


ASF GitHub Bot logged work on HIVE-21075:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/May/21 07:19
            Start Date: 27/May/21 07:19
    Worklog Time Spent: 10m 
      Work Description: oleksiy-sayankin commented on a change in pull request 
#2323:
URL: https://github.com/apache/hive/pull/2323#discussion_r640351626



##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -5269,6 +5293,32 @@ private void preDropStorageDescriptor(MStorageDescriptor 
msd) {
     removeUnusedColumnDescriptor(mcd);
   }
 
+  /**
+   * Get a list of storage descriptors that reference a particular Column 
Descriptor
+   * @param oldCD the column descriptor to get storage descriptors for
+   * @return a list of storage descriptors
+   */
+  private List<MStorageDescriptor> 
listStorageDescriptorsWithCD(MColumnDescriptor oldCD, Query query) {
+    boolean success = false;
+    List<MStorageDescriptor> sds = null;
+    try {
+      openTransaction();

Review comment:
       In my understanding transactions are usually used when one has CREATE, 
UPDATE or DELETE statements and one wants to have the atomic behavior, that is, 
either commit everything or commit nothing. Consider all queries we have here
   
   1. Query for postgres
   
           SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1)
   
   2. Query for all DBs except postgres
   
           select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
   
   3. Constraint deletion
   
           pm.deletePersistentAll(mConstraintsList);
   
   4. CD deletion
   
           pm.deletePersistent(oldCD);
   
   IMHO we need transaction only for items 3 and 4 and it should look this way:
   
           1. SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1)
           2. select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)
           openTransaction()
           3. pm.deletePersistentAll(mConstraintsList);
           4. pm.deletePersistent(oldCD);
           commitTransaction()
   
   whilst the old implementation was
   
           openTransaction()        
           1. select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)        
           2. pm.deletePersistentAll(mConstraintsList);
           3. pm.deletePersistent(oldCD);
           commitTransaction()  
   
   which seems to me redundant. Make sense?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 602742)
    Time Spent: 50m  (was: 40m)

> Metastore: Drop partition performance downgrade with Postgres DB
> ----------------------------------------------------------------
>
>                 Key: HIVE-21075
>                 URL: https://issues.apache.org/jira/browse/HIVE-21075
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Yongzhi Chen
>            Assignee: Oleksiy Sayankin
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-21075) Metastore: Drop partition performance downgrade with Postgres DB

Reply via email to