Chaoyu Tang created HIVE-16147:
----------------------------------
Summary: Rename a partitioned table should not drop its partition
columns stats
Key: HIVE-16147
URL: https://issues.apache.org/jira/browse/HIVE-16147
Project: Hive
Issue Type: Bug
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
When a partitioned table (e.g. sample_pt) is renamed (e.g to sample_pt_rename),
describing its partition shows that the partition column stats are still
accurate, but actually they all have been dropped.
It could be reproduce as following:
1. analyze table sample_pt compute statistics for columns;
2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS
for all columns are true
{code}
...
# Detailed Partition Information
Partition Value: [3]
Database: default
Table: sample_pt
CreateTime: Fri Jan 20 15:42:30 EST 2017
LastAccessTime: UNKNOWN
Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
Partition Parameters:
COLUMN_STATS_ACCURATE
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
last_modified_by ctang
last_modified_time 1485217063
numFiles 1
numRows 100
rawDataSize 5143
totalSize 5243
transient_lastDdlTime 1488842358
...
{code}
3: describe formatted default.sample_pt partition (dummy = 3) salary: column
stats exists
{code}
# col_name data_type min max
num_nulls distinct_count avg_col_len
max_col_len num_trues num_falses
comment
salary int 1 151370
0 94
from deserializer
{code}
4. alter table sample_pt rename to sample_pt_rename;
5. describe formatted default.sample_pt_rename partition (dummy = 3): describe
the rename table partition (dummy =3) shows that COLUMN_STATS for columns are
still true.
{code}
# Detailed Partition Information
Partition Value: [3]
Database: default
Table: sample_pt_rename
CreateTime: Fri Jan 20 15:42:30 EST 2017
LastAccessTime: UNKNOWN
Location:
file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
Partition Parameters:
COLUMN_STATS_ACCURATE
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
last_modified_by ctang
last_modified_time 1485217063
numFiles 1
numRows 100
rawDataSize 5143
totalSize 5243
transient_lastDdlTime 1488842358
{code}
describe formatted default.sample_pt_rename partition (dummy = 3) salary: the
column stats have been dropped.
{code}
# col_name data_type comment
salary int from deserializer
Time taken: 0.131 seconds, Fetched: 3 row(s)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)