[ https://issues.apache.org/jira/browse/HIVE-26649?focusedWorklogId=822540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-822540 ]
ASF GitHub Bot logged work on HIVE-26649: ----------------------------------------- Author: ASF GitHub Bot Created on: 02/Nov/22 06:43 Start Date: 02/Nov/22 06:43 Worklog Time Spent: 10m Work Description: SelfImpr001 commented on PR #3714: URL: https://github.com/apache/hive/pull/3714#issuecomment-1299654405 Can someone please review this pr Issue Time Tracking ------------------- Worklog Id: (was: 822540) Remaining Estimate: 95.5h (was: 95h 40m) Time Spent: 0.5h (was: 20m) > Hive metabase performance issues due to slow queries > ---------------------------------------------------- > > Key: HIVE-26649 > URL: https://issues.apache.org/jira/browse/HIVE-26649 > Project: Hive > Issue Type: Improvement > Components: Metastore > Affects Versions: 2.3.3 > Environment: metastore db :mysql 5.X > hive:2.3.3 > Reporter: yihangqiao > Assignee: yihangqiao > Priority: Major > Labels: metastore, patch, performance, pull-request-available > Attachments: image-2022-10-19-14-42-33-073.png > > Original Estimate: 96h > Time Spent: 0.5h > Remaining Estimate: 95.5h > > When the Hive metabase uses Mysql, during the peak period of Hive statement > query, the metastore initiates a large amount of DirectSQL, which will cause > performance problems in the metabase. The fundamental reason is that some > DirectSQL performance problems cause a large number of slow queries at the DB > level. > For example for the following Hive query: > {code:java} > select * from imd_fcac_safe.fcac_dw_loan_details where ds='2021-10-10' and > sysid='MCFCM' {code} > where ds and sysid are the primary and secondary partitions of the > imd_fcac_safe.fcac_dw_loan_details table, respectively > > The Hive statement will generate the DirectSQL query as follows: > {code:java} > explain select PARTITIONS.PART_ID from PARTITIONS inner join TBLS on > PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = > 'fcac_dw_loan_details' inner join DBS on TBLS.DB_ID = DBS.DB_ID and > DBS.NAME = 'imd_fcac_safe' inner join PARTITION_KEY_VALS FILTER0 on > FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 inner join > PARTITION_KEY_VALS FILTER1 on FILTER1.PART_ID = PARTITIONS.PART_ID and > FILTER1.INTEGER_IDX = 1 where ( ((FILTER0.PART_KEY_VAL = '2021-10-10') and > (FILTER1.PART_KEY_VAL = 'MCFCM')) ) {code} > !image-2022-10-19-14-42-33-073.png! > > Problems with this statement > There is no TBL_ID field in the PARTITION_KEY_VALS table, which will cause > the partition of the same name of the unrelated table to be described when > performing an associated query; there is no index column in the > PARTITION_KEY_VAL table, so it cannot be accelerated by the index. -- This message was sent by Atlassian Jira (v8.20.10#820010)