Naveen Gangam created HIVE-18328:
------------------------------------

             Summary: Improve schematool validator to report duplicate rows for 
column statistics
                 Key: HIVE-18328
                 URL: https://issues.apache.org/jira/browse/HIVE-18328
             Project: Hive
          Issue Type: Improvement
          Components: Hive
    Affects Versions: 2.1.1
            Reporter: Naveen Gangam
            Assignee: Naveen Gangam


By design, in the {{TAB_COL_STATS}} table of the HMS schema, there should be 
ONE AND ONLY ONE row, representing its statistics, for each column defined in 
hive. A combination of DB_NAME, TABLE_NAME and COLUMN_NAME constitute a primary 
key/unique row.
Each time the statistics are computed for a column, this row is updated. 
However, if somehow via  BDR/replication process, we end up with multiple rows 
in this table for a given column, HMS server to recompute the statistics there 
after.
So it would be good to detect this data anamoly via the schema validation tool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to