Amareshwari Sriramadasu created HIVE-3962:
---------------------------------------------
Summary: number of distinct values are in column statistics
Key: HIVE-3962
URL: https://issues.apache.org/jira/browse/HIVE-3962
Project: Hive
Issue Type: Bug
Components: Statistics
Affects Versions: 0.10.0
Reporter: Amareshwari Sriramadasu
When we run the query on hive ql src table :
select count(distinct(key)), count(distinct(value) from src;
309 309
After running the following analyze query, the stats in metastore seem wrong:
analyze table src compute statistics for columns key, value;
--- stats in metastore ---
mysql > select * from TAB_COL_STATS where TABLE_NAME="src";
| CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | COLUMN_TYPE | TBL_ID |
LONG_LOW_VALUE | LONG_HIGH_VALUE | DOUBLE_HIGH_VALUE | DOUBLE_LOW_VALUE |
BIG_DECIMAL_LOW_VALUE | BIG_DECIMAL_HIGH_VALUE | NUM_NULLS | NUM_DISTINCTS |
AVG_COL_LEN | MAX_COL_LEN | NUM_TRUES | NUM_FALSES | LAST_ANALYZED |
| 5 | default | src | key | int | 11 |
0 | 498 | 0.0000 | 0.0000 | NULL
| NULL | 0 | 291 | 0.0000 |
0 | 0 | 0 | 1359539181 |
| 6 | default | src | value | string | 11 |
0 | 0 | 0.0000 | 0.0000 | NULL
| NULL | 0 | 112 | 6.8120 |
7 | 0 | 0 | 1359539181 |
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira