[jira] [Work logged] (HIVE-25867) Partition filter condition should pushed down to metastore query if it is equivalence Predicate

ASF GitHub Bot (Jira) Mon, 21 Mar 2022 09:28:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25867?focusedWorklogId=745216&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-745216
 ]


ASF GitHub Bot logged work on HIVE-25867:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Mar/22 16:27
            Start Date: 21/Mar/22 16:27
    Worklog Time Spent: 10m 
      Work Description: zzzzming95 commented on pull request #2947:
URL: https://github.com/apache/hive/pull/2947#issuecomment-1074123353


   > 600k partitions are always a pain. We are working on integrating Iceberg 
that would help there. I would guess that the solution is this case should be a 
configuration to turn off DB level filtering on query level. Maybe we already 
have a config for that.
   
   @pvary 
   
   I don't quite understand "turn off DB level filtering"
   
   
   Let me give an example to detail my issue, the hql is :
   
   ```
   select * from src_play_day WHERE dt=20211125 AND u_gtype='activity_workshop' 
limit 10;
   ```
   the `dt` and `u_gtype` are both partition field , **and column type are both 
string.**
   
   
   Before this change , metastore **not push down partition** to HMS DB, it 
will get all partitions and prune partition by expression.**Because in 
`dt=20211125`, dt on the left is of type string and 20211125 on the right is of 
type Long.** Finally,metastore get the required partition information in the 
HMS DB. 
   
   In MYSQL ,it will run **two sql** :
   
   ```
   SELECT `A0`.`PART_NAME` AS NUCORDER0
   FROM `PARTITIONS` `A0`
       LEFT JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID`
       LEFT JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID`
   WHERE `C0`.`NAME` = 'default'
       AND `B0`.`TBL_NAME` = 'src_play_day'
   ORDER BY NUCORDER0;
   
   +
   
   SELECT "PARTITIONS"."PART_ID"
   FROM "PARTITIONS"
       INNER JOIN "TBLS"
       ON "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
           AND "TBLS"."TBL_NAME" = 'src_play_day'
       INNER JOIN "DBS"
       ON "TBLS"."DB_ID" = "DBS"."DB_ID"
           AND "DBS"."NAME" = 'default'
   WHERE "PARTITIONS"."PART_NAME" IN ('dt=20211125/u_gtype=activity_workshop');
   ```
   
   Get all partitions will easy increase the load on the HMS DB.
   
   After this change, metastore will push down partition to HMS DB, and not get 
all partitions.
   
   In MYSQL ,it will run just one sql :
   ```
   SELECT "PARTITIONS"."PART_ID"
   FROM "PARTITIONS"
       INNER JOIN "TBLS"
       ON "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
           AND "TBLS"."TBL_NAME" = 'src_play_day'
       INNER JOIN "DBS"
       ON "TBLS"."DB_ID" = "DBS"."DB_ID"
           AND "DBS"."NAME" = 'default'
       INNER JOIN "PARTITION_KEY_VALS" "FILTER0"
       ON "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
           AND "FILTER0"."INTEGER_IDX" = 0
       INNER JOIN "PARTITION_KEY_VALS" "FILTER1"
       ON "FILTER1"."PART_ID" = "PARTITIONS"."PART_ID"
           AND "FILTER1"."INTEGER_IDX" = 1
   WHERE "FILTER0"."PART_KEY_VAL" = '20211125'
       AND "FILTER1"."PART_KEY_VAL" = 'activity_workshop';
   ```
   
   So this issue is to allow the partition to be pushed down to the HMS DB in 
some cases, so as to avoid getting all the partition  of the table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 745216)
    Time Spent: 1h 40m  (was: 1.5h)

> Partition filter condition should pushed down to metastore query if it is 
> equivalence Predicate
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25867
>                 URL: https://issues.apache.org/jira/browse/HIVE-25867
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>            Reporter: shezm
>            Assignee: shezm
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The colnum type of the partition is different from the column type of the hql 
> query, the metastore will not push down the query to the RDBMS, but will 
> instead get all PARTITIONS.PART_NAME of the hive table then filter it 
> according to the hql Expression. 
> https://github.com/apache/hive/blob/5b112aa6dcc4e374c0a7c2b24042f24ae6815da1/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1316
> If the hive table has too many partitions and there are multiple hql queries 
> at the same time,RDBMS will increasing CPU IO_WAIT and affect performance.
> If the partition filter condition in hql is an equivalent predicate, the 
> metastore should be pushed down to RDBMS, which can optimize the query 
> performance of hive large tables.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25867) Partition filter condition should pushed down to metastore query if it is equivalence Predicate

Reply via email to