yihangqiao created HIVE-26649:
---------------------------------

             Summary: Hive metabase performance issues due to slow queries
                 Key: HIVE-26649
                 URL: https://issues.apache.org/jira/browse/HIVE-26649
             Project: Hive
          Issue Type: Improvement
          Components: Metastore
    Affects Versions: 2.3.3
         Environment: metastore db :mysql 5.X

hive:2.3.3
            Reporter: yihangqiao
            Assignee: yihangqiao
             Fix For: 2.3.3
         Attachments: image-2022-10-19-14-42-33-073.png

When the Hive metabase uses Mysql, during the peak period of Hive statement 
query, the metastore initiates a large amount of DirectSQL, which will cause 
performance problems in the metabase. The fundamental reason is that some 
DirectSQL performance problems cause a large number of slow queries at the DB 
level.
For example for the following Hive query:
{code:java}
select * from imd_fcac_safe.fcac_dw_loan_details where ds='2021-10-10' and 
sysid='MCFCM' {code}
where ds and sysid are the primary and secondary partitions of the 
imd_fcac_safe.fcac_dw_loan_details table, respectively
 
The Hive statement will generate the DirectSQL query as follows:
{code:java}
explain select PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on 
PARTITIONS.TBL_ID = TBLS.TBL_ID     and TBLS.TBL_NAME = 'fcac_dw_loan_details'  
 inner join DBS on TBLS.DB_ID = DBS.DB_ID      and DBS.NAME = 'imd_fcac_safe' 
inner join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID 
and FILTER0.INTEGER_IDX = 0 inner join PARTITION_KEY_VALS FILTER1 on 
FILTER1.PART_ID = PARTITIONS.PART_ID and FILTER1.INTEGER_IDX = 1 where ( 
((FILTER0.PART_KEY_VAL = '2021-10-10') and (FILTER1.PART_KEY_VAL = 'MCFCM')) ) 
{code}
!image-2022-10-19-14-42-33-073.png!
 
Problems with this statement
There is no TBL_ID field in the PARTITION_KEY_VALS table, which will cause the 
partition of the same name of the unrelated table to be described when 
performing an associated query; there is no index column in the 
PARTITION_KEY_VAL table, so it cannot be accelerated by the index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to