[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947732#comment-14947732 ]
Siddharth Seth commented on HIVE-11786: --------------------------------------- No difference with the remaining indexes. (The index creation takes a long time btw - and may impact stat generation ?) {code} 2015-10-07T18:38:09,444 DEBUG [main([])]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 16195.018669ms + 0.058186ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE", min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), sum("NUM_FALSES"), avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE" as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME" "DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME" "PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE", "PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE", "PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE", "PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS", "PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES", "PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN "PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON ("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" = "DBS"."DB_ID")) VW where "DB_NAME" = ? and "TABLE_NAME" = ? and "COLUMN_NAME" in (?) and "PARTITION_NAME" in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,? {code} {code} 2015-10-07T18:38:29,309 DEBUG [main([])]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 18651.1996ms + 0.050665ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE", min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), sum("NUM_FALSES"), avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE" as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME" "DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME" "PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE", "PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE", "PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE", "PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS", "PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES", "PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN "PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON ("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" = "DBS"."DB_ID")) VW where "DB_NAME" = ? and "TABLE_NAME" = ? and "COLUMN_NAME" in (?) and "PARTITION_NAME" in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,? {code} > Deprecate the use of redundant column in colunm stats related tables > -------------------------------------------------------------------- > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore > Reporter: Chaoyu Tang > Assignee: Chaoyu Tang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, > HIVE-11786.2.patch, HIVE-11786.patch > > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate database normalization rules and cause a lot > of inconvenience (sometimes difficult) in column stats related feature > implementation. For example, when renaming a table, we have to update > TABLE_NAME column in these tables as well which is unnecessary. > This JIRA is first to deprecate the use of these columns at HMS code level. A > followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)