[jira] [Updated] (HIVE-3917) Support fast operation for analyze command

Gang Tim Liu (JIRA) Fri, 18 Jan 2013 23:26:14 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gang Tim Liu updated HIVE-3917:
-------------------------------

    Description: 
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files:
1. Number of files
2. Size in Bytes

Potential syntax is 
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
COMPUTE STATISTICS [noscan];

In the future, all statistics without scan can be retrieved via this optional 
parameter.


  was:
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files:
1. Number of files
2. Size in Bytes

Potential syntax is 
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
COMPUTE STATISTICS [noscan];




    
> Support fast operation for analyze command
> ------------------------------------------
>
>                 Key: HIVE-3917
>                 URL: https://issues.apache.org/jira/browse/HIVE-3917
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>    Affects Versions: 0.11.0
>            Reporter: Gang Tim Liu
>            Assignee: Gang Tim Liu
>
> hive supports analyze command to gather statistics from existing 
> tables/partition 
> https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
> It collects:
> 1. Number of Rows
> 2. Number of files
> 3. Size in Bytes
> If table/partition is big, the operation would take time since it will open 
> all files and scan all data.
> It would be nice to support fast operation to gather statistics which doesn't 
> require to open all files:
> 1. Number of files
> 2. Size in Bytes
> Potential syntax is 
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
> COMPUTE STATISTICS [noscan];
> In the future, all statistics without scan can be retrieved via this optional 
> parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3917) Support fast operation for analyze command

Reply via email to