[ https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081590#comment-14081590 ]
Damien Carol commented on HIVE-7506: ------------------------------------ [~hagleitn] interesting. I understand more well your need now. Again, the command ANALYZE cover the case (2). {quote} (2) As database runs, the statistics of a column in a table (or a partition of a table) may change. We need a way or a mechanism to synchronize. {quote} Now for the (1), I don't think that provide a syntax to force values that doesn't match the real data is a good idea. If the main goal is to provide "fake" values from metastore, maybe we can use this method to update the stats in unit tests : {code:title=ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2541} public boolean updateTableColumnStatistics(ColumnStatistics statsObj) throws HiveException { try { return getMSC().updateTableColumnStatistics(statsObj); } catch (Exception e) { LOG.debug(StringUtils.stringifyException(e)); throw new HiveException(e); } } public boolean updatePartitionColumnStatistics(ColumnStatistics statsObj) throws HiveException { try { return getMSC().updatePartitionColumnStatistics(statsObj); } catch (Exception e) { LOG.debug(StringUtils.stringifyException(e)); throw new HiveException(e); } } {code} https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2541 The metastore Thrift server and Hive ANALYZE command use this one : {code:title=ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java#L339} // Fetch result of the analyze table partition (p1=c1).. compute statistics for columns .. // Construct a column statistics object from the result List<ColumnStatistics> colStats = constructColumnStatsFromPackedRows(); // Persist the column statistics object to the metastore for (ColumnStatistics colStat : colStats) { db.updatePartitionColumnStatistics(colStat); } {code} https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java#L339 > MetadataUpdater: provide a mechanism to edit the statistics of a column in a > table (or a partition of a table) > -------------------------------------------------------------------------------------------------------------- > > Key: HIVE-7506 > URL: https://issues.apache.org/jira/browse/HIVE-7506 > Project: Hive > Issue Type: New Feature > Components: Database/Schema > Reporter: pengcheng xiong > Assignee: pengcheng xiong > Priority: Minor > Original Estimate: 252h > Remaining Estimate: 252h > > Two motivations: > (1) CBO depends heavily on the statistics of a column in a table (or a > partition of a table). If we would like to test whether CBO chooses the best > plan under different statistics, it would be time consuming if we load the > whole table and create the statistics from ground up. > (2) As database runs, the statistics of a column in a table (or a partition > of a table) may change. We need a way or a mechanism to synchronize. > We propose the following command to achieve that: > ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE > STATISTICS col_statistics [COMMENT col_comment] -- This message was sent by Atlassian JIRA (v6.2#6252)