[ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengcheng xiong updated HIVE-7506:
----------------------------------

    Status: Patch Available  (was: Reopened)

This patch provides ability to update certain stats without scanning any data 
or without "hacking the backend db". It helps (esp for CBO work) to set up unit 
tests quickly and verify both cbo and the stats subsystem. It also helps when 
experimenting with the system if you're just trying out hive/hadoop on a small 
cluster. Finally it gives you a quick and clean way to fix things when 
something went wrong wrt stats in your environment.

Usage:

ALTER TABLE table_name PARTITION partition_spec UPDATE STATISTICS FOR COLUMN 
col_name SET col_statistics

For example,

ALTER TABLE src_x_int UPDATE STATISTICS FOR COLUMN key SET 
('numDVs'='101','highValue'='10001.0');


ALTER TABLE src_p PARTITION(partitionId=1) UPDATE STATISTICS FOR COLUMN key SET 
('numDVs'='100','avgColLen'='1.0001');

> MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
> table (or a partition of a table)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7506
>                 URL: https://issues.apache.org/jira/browse/HIVE-7506
>             Project: Hive
>          Issue Type: New Feature
>          Components: Database/Schema
>            Reporter: pengcheng xiong
>            Assignee: pengcheng xiong
>            Priority: Minor
>         Attachments: HIVE-7506.patch
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>
> Two motivations:
> (1) CBO depends heavily on the statistics of a column in a table (or a 
> partition of a table). If we would like to test whether CBO chooses the best 
> plan under different statistics, it would be time consuming if we load the 
> whole table and create the statistics from ground up.
> (2) As database runs,  the statistics of a column in a table (or a partition 
> of a table) may change. We need a way or a mechanism to synchronize. 
> We propose the following command to achieve that:
> ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
> STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to