[
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081590#comment-14081590
]
Damien Carol commented on HIVE-7506:
------------------------------------
[~hagleitn] interesting. I understand more well your need now.
Again, the command ANALYZE cover the case (2).
{quote}
(2) As database runs, the statistics of a column in a table (or a partition of
a table) may change. We need a way or a mechanism to synchronize.
{quote}
Now for the (1), I don't think that provide a syntax to force values that
doesn't match the real data is a good idea.
If the main goal is to provide "fake" values from metastore, maybe we can use
this method to update the stats in unit tests :
{code:title=ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2541}
public boolean updateTableColumnStatistics(ColumnStatistics statsObj) throws
HiveException {
try {
return getMSC().updateTableColumnStatistics(statsObj);
} catch (Exception e) {
LOG.debug(StringUtils.stringifyException(e));
throw new HiveException(e);
}
}
public boolean updatePartitionColumnStatistics(ColumnStatistics statsObj)
throws HiveException {
try {
return getMSC().updatePartitionColumnStatistics(statsObj);
} catch (Exception e) {
LOG.debug(StringUtils.stringifyException(e));
throw new HiveException(e);
}
}
{code}
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2541
The metastore Thrift server and Hive ANALYZE command use this one :
{code:title=ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java#L339}
// Fetch result of the analyze table partition (p1=c1).. compute statistics
for columns ..
// Construct a column statistics object from the result
List<ColumnStatistics> colStats = constructColumnStatsFromPackedRows();
// Persist the column statistics object to the metastore
for (ColumnStatistics colStat : colStats) {
db.updatePartitionColumnStatistics(colStat);
}
{code}
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java#L339
> MetadataUpdater: provide a mechanism to edit the statistics of a column in a
> table (or a partition of a table)
> --------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-7506
> URL: https://issues.apache.org/jira/browse/HIVE-7506
> Project: Hive
> Issue Type: New Feature
> Components: Database/Schema
> Reporter: pengcheng xiong
> Assignee: pengcheng xiong
> Priority: Minor
> Original Estimate: 252h
> Remaining Estimate: 252h
>
> Two motivations:
> (1) CBO depends heavily on the statistics of a column in a table (or a
> partition of a table). If we would like to test whether CBO chooses the best
> plan under different statistics, it would be time consuming if we load the
> whole table and create the statistics from ground up.
> (2) As database runs, the statistics of a column in a table (or a partition
> of a table) may change. We need a way or a mechanism to synchronize.
> We propose the following command to achieve that:
> ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE
> STATISTICS col_statistics [COMMENT col_comment]
--
This message was sent by Atlassian JIRA
(v6.2#6252)