[ 
https://issues.apache.org/jira/browse/HIVE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5369:
-----------------------------

    Description: 
Currently the statistics gathered at table/partition level and column level are 
not used during query planning stage. Statistics at table/partition and column 
level can be used for optimizing the query plans. Basic statistics like 
uncompressed data size can be used for better reducer estimation. Other 
statistics like number of rows, distinct values of columns, average length of 
columns etc. can be used by Cost Based Optimizer (CBO) for making better query 
plan selection. As a first step in improving query planning the statistics that 
are available in the metastore should be attached to hive operator tree. The 
operator tree should be walked and annotated with statistics information. The 
attached statistics will vary for each operator depending on the operation it 
performs. For example, select operator will change the average row size but 
doesn't affect the number of rows. Similarly filter operator will change the 
number of rows but doesn't change the average row size. Similar rules can be 
applied for other operators as well. 

Rules for different operators are added as comments in the code. For more 
detailed information, the reference book that I am using is "Database Systems: 
The Complete Book" by Garcia-Molina et.al.

  was:Currently the statistics gathered at table/partition level and column 
level are not used during query planning stage. Statistics at table/partition 
and column level can be used for optimizing the query plans. Basic statistics 
like uncompressed data size can be used for better reducer estimation. Other 
statistics like number of rows, distinct values of columns, average length of 
columns etc. can be used by Cost Based Optimizer (CBO) for making better query 
plan selection. As a first step in improving query planning the statistics that 
are available in the metastore should be attached to hive operator tree. The 
operator tree should be walked and annotated with statistics information. The 
attached statistics will vary for each operator depending on the operation it 
performs. For example, select operator will change the average row size but 
doesn't affect the number of rows. Similarly filter operator will change the 
number of rows but doesn't change the average row size. Similar rules can be 
applied for other operators as well. 

    
> Annotate hive operator tree with statistics from metastore
> ----------------------------------------------------------
>
>                 Key: HIVE-5369
>                 URL: https://issues.apache.org/jira/browse/HIVE-5369
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor, Statistics
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: statistics
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5369.WIP.txt
>
>
> Currently the statistics gathered at table/partition level and column level 
> are not used during query planning stage. Statistics at table/partition and 
> column level can be used for optimizing the query plans. Basic statistics 
> like uncompressed data size can be used for better reducer estimation. Other 
> statistics like number of rows, distinct values of columns, average length of 
> columns etc. can be used by Cost Based Optimizer (CBO) for making better 
> query plan selection. As a first step in improving query planning the 
> statistics that are available in the metastore should be attached to hive 
> operator tree. The operator tree should be walked and annotated with 
> statistics information. The attached statistics will vary for each operator 
> depending on the operation it performs. For example, select operator will 
> change the average row size but doesn't affect the number of rows. Similarly 
> filter operator will change the number of rows but doesn't change the average 
> row size. Similar rules can be applied for other operators as well. 
> Rules for different operators are added as comments in the code. For more 
> detailed information, the reference book that I am using is "Database 
> Systems: The Complete Book" by Garcia-Molina et.al.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to