[ 
https://issues.apache.org/jira/browse/HIVE-26649?focusedWorklogId=822540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-822540
 ]

ASF GitHub Bot logged work on HIVE-26649:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Nov/22 06:43
            Start Date: 02/Nov/22 06:43
    Worklog Time Spent: 10m 
      Work Description: SelfImpr001 commented on PR #3714:
URL: https://github.com/apache/hive/pull/3714#issuecomment-1299654405

   Can someone please review this pr




Issue Time Tracking
-------------------

            Worklog Id:     (was: 822540)
    Remaining Estimate: 95.5h  (was: 95h 40m)
            Time Spent: 0.5h  (was: 20m)

> Hive metabase performance issues due to slow queries
> ----------------------------------------------------
>
>                 Key: HIVE-26649
>                 URL: https://issues.apache.org/jira/browse/HIVE-26649
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.3.3
>         Environment: metastore db :mysql 5.X
> hive:2.3.3
>            Reporter: yihangqiao
>            Assignee: yihangqiao
>            Priority: Major
>              Labels: metastore, patch, performance, pull-request-available
>         Attachments: image-2022-10-19-14-42-33-073.png
>
>   Original Estimate: 96h
>          Time Spent: 0.5h
>  Remaining Estimate: 95.5h
>
> When the Hive metabase uses Mysql, during the peak period of Hive statement 
> query, the metastore initiates a large amount of DirectSQL, which will cause 
> performance problems in the metabase. The fundamental reason is that some 
> DirectSQL performance problems cause a large number of slow queries at the DB 
> level.
> For example for the following Hive query:
> {code:java}
> select * from imd_fcac_safe.fcac_dw_loan_details where ds='2021-10-10' and 
> sysid='MCFCM' {code}
> where ds and sysid are the primary and secondary partitions of the 
> imd_fcac_safe.fcac_dw_loan_details table, respectively
>  
> The Hive statement will generate the DirectSQL query as follows:
> {code:java}
> explain select PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
> PARTITIONS.TBL_ID = TBLS.TBL_ID     and TBLS.TBL_NAME = 
> 'fcac_dw_loan_details'   inner join DBS on TBLS.DB_ID = DBS.DB_ID      and 
> DBS.NAME = 'imd_fcac_safe' inner join PARTITION_KEY_VALS FILTER0 on 
> FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 inner join 
> PARTITION_KEY_VALS FILTER1 on FILTER1.PART_ID = PARTITIONS.PART_ID and 
> FILTER1.INTEGER_IDX = 1 where ( ((FILTER0.PART_KEY_VAL = '2021-10-10') and 
> (FILTER1.PART_KEY_VAL = 'MCFCM')) ) {code}
> !image-2022-10-19-14-42-33-073.png!
>  
> Problems with this statement
> There is no TBL_ID field in the PARTITION_KEY_VALS table, which will cause 
> the partition of the same name of the unrelated table to be described when 
> performing an associated query; there is no index column in the 
> PARTITION_KEY_VAL table, so it cannot be accelerated by the index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to