[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739725#comment-14739725 ]
Chaoyu Tang commented on HIVE-11786: ------------------------------------ In ORM, DataNucleus will generate a SQL query with joined tables (TAB_COL_STATS/PART_COL_STATS, DBS, TBLS, PARTITIONS) instead of only one table TAB_COL_STATS/PART_COL_STATS. In DirectSQL, we need also modify the queries accordingly. Though query to only one table might run faster than that to multiple joined tables in the DBs that HMS supports, the performance impact should be very small given that the tables are joined via their keys. > Deprecate the use of redundant column in colunm stats related tables > -------------------------------------------------------------------- > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore > Reporter: Chaoyu Tang > Assignee: Chaoyu Tang > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate database normalization rules and cause a lot > of inconvenience (sometimes difficult) in column stats related feature > implementation. For example, when renaming a table, we have to update > TABLE_NAME column in these tables as well which is unnecessary. > This JIRA is first to deprecate the use of these columns at HMS code level. A > followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)