[ 
https://issues.apache.org/jira/browse/HIVE-21075?focusedWorklogId=688079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-688079
 ]

ASF GitHub Bot logged work on HIVE-21075:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Nov/21 15:05
            Start Date: 30/Nov/21 15:05
    Worklog Time Spent: 10m 
      Work Description: pvary commented on a change in pull request #2826:
URL: https://github.com/apache/hive/pull/2826#discussion_r759369978



##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -5265,6 +5265,54 @@ private void copyMSD(MStorageDescriptor newSd, 
MStorageDescriptor oldSd) {
     oldSd.setStoredAsSubDirectories(newSd.isStoredAsSubDirectories());
   }
 
+  /**
+   * Checks if a column descriptor has any remaining references by storage 
descriptors
+   * in the db.
+   * @param oldCD the column descriptor to check if it has references or not
+   * @return true if has references
+   */
+  private boolean hasRemainingCDReference(MColumnDescriptor oldCD) {
+    assert oldCD != null;
+    Query query = null;
+
+    /**
+     * In order to workaround oracle not supporting limit statement caused 
performance issue, HIVE-9447 makes
+     * all the backend DB run select count(1) from SDS where SDS.CD_ID=? to 
check if the specific CD_ID is
+     * referenced in SDS table before drop a partition. This select count(1) 
statement does not scale well in
+     * Postgres, and there is no index for CD_ID column in SDS table.
+     * For a SDS table with with 1.5 million rows, select count(1) has average 
700ms without index, while in
+     * 10-20ms with index. But the statement before
+     * HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) 
uses less than 10ms .
+     */
+    try {
+      // HIVE-21075: Fix Postgres performance regression caused by HIVE-9447
+      DatabaseProduct dbProduct = 
DatabaseProduct.determineDatabaseProduct(MetaStoreDirectSql.getProductName(pm), 
conf);
+      if (dbProduct.isPOSTGRES() || dbProduct.isMYSQL()) {
+        query = pm.newQuery(MStorageDescriptor.class, "this.cd == inCD");
+        query.declareParameters("MColumnDescriptor inCD");
+        List<MStorageDescriptor> referencedSDs = 
listStorageDescriptorsWithCD(oldCD, query);
+        //if no other SD references this CD, we can throw it out.
+        if (referencedSDs != null && referencedSDs.isEmpty()) {
+          return false;
+        }
+      } else {
+        query = pm.newQuery(
+                "select count(1) from 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor where (this.cd == 
inCD)");
+        query.declareParameters("MColumnDescriptor inCD");
+        long count = (Long) query.execute(oldCD);
+        //if no other SD references this CD, we can throw it out.
+        if (count == 0) {
+          return false;
+        }
+      }
+      return true;

Review comment:
       Nit: For me it would be more natural to do this here:
   ```
   return count == 0;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 688079)
    Time Spent: 6h 50m  (was: 6h 40m)

> Metastore: Drop partition performance downgrade with Postgres DB
> ----------------------------------------------------------------
>
>                 Key: HIVE-21075
>                 URL: https://issues.apache.org/jira/browse/HIVE-21075
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Yongzhi Chen
>            Assignee: Oleksiy Sayankin
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21075.2.patch
>
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> In order to workaround oracle not supporting limit statement caused 
> performance issue, HIVE-9447 makes all the backend DB run select count(1) 
> from SDS where SDS.CD_ID=? to check if the specific CD_ID is referenced in 
> SDS table before drop a partition. This select count(1) statement does not 
> scale well in Postgres, and there is no index for CD_ID column in SDS table.
> For a SDS table with with 1.5 million rows, select count(1) has average 700ms 
> without index, while in 10-20ms with index. But the statement before 
> HIVE-9447( SELECT * FROM "SDS" "A0" WHERE "A0"."CD_ID" = $1 limit 1) uses 
> less than 10ms .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to