[ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081590#comment-14081590
 ] 

Damien Carol commented on HIVE-7506:
------------------------------------

[~hagleitn] interesting. I understand more well your need now.
Again, the command ANALYZE cover the case (2).
{quote}
(2) As database runs,  the statistics of a column in a table (or a partition of 
a table) may change. We need a way or a mechanism to synchronize.
{quote}
Now for the (1), I don't think that provide a syntax to force values that 
doesn't match the real data is a good idea.
If the main goal is to provide "fake" values from metastore, maybe we can use 
this method to update the stats in unit tests :
{code:title=ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2541}
public boolean updateTableColumnStatistics(ColumnStatistics statsObj) throws 
HiveException {
    try {
      return getMSC().updateTableColumnStatistics(statsObj);
    } catch (Exception e) {
      LOG.debug(StringUtils.stringifyException(e));
      throw new HiveException(e);
    }
  }

  public boolean updatePartitionColumnStatistics(ColumnStatistics statsObj) 
throws HiveException {
    try {
      return getMSC().updatePartitionColumnStatistics(statsObj);
    } catch (Exception e) {
      LOG.debug(StringUtils.stringifyException(e));
      throw new HiveException(e);
    }
  }
{code}
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2541

The metastore Thrift server and Hive ANALYZE command use this one :
{code:title=ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java#L339}
    // Fetch result of the analyze table partition (p1=c1).. compute statistics 
for columns ..
    // Construct a column statistics object from the result
    List<ColumnStatistics> colStats = constructColumnStatsFromPackedRows();
    // Persist the column statistics object to the metastore
    for (ColumnStatistics colStat : colStats) {
      db.updatePartitionColumnStatistics(colStat);
    }
{code}
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java#L339


> MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
> table (or a partition of a table)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7506
>                 URL: https://issues.apache.org/jira/browse/HIVE-7506
>             Project: Hive
>          Issue Type: New Feature
>          Components: Database/Schema
>            Reporter: pengcheng xiong
>            Assignee: pengcheng xiong
>            Priority: Minor
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>
> Two motivations:
> (1) CBO depends heavily on the statistics of a column in a table (or a 
> partition of a table). If we would like to test whether CBO chooses the best 
> plan under different statistics, it would be time consuming if we load the 
> whole table and create the statistics from ground up.
> (2) As database runs,  the statistics of a column in a table (or a partition 
> of a table) may change. We need a way or a mechanism to synchronize. 
> We propose the following command to achieve that:
> ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
> STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to