[ 
https://issues.apache.org/jira/browse/HIVE-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989494#comment-12989494
 ] 

Anja Gruenheid commented on HIVE-1940:
--------------------------------------

As first step, I would like to take a closer look at collecting meta data on 
the column level. In issue HIVE-33, five different statistics are described (# 
distinct values, # null values, 3 min values, 3 max values, avg size of column) 
that have been proposed as column meta data. As reference, I would take the 
implementation of the table/partition meta data collection.
As far as I can tell, deriving histograms is a little bit more complex than 
obtaining column information, which is why I want to start out with that.

Is there an up-to-date MetaStore DDL script or an E/R model?

> Query Optimization Using Column Metadata and Histograms
> -------------------------------------------------------
>
>                 Key: HIVE-1940
>                 URL: https://issues.apache.org/jira/browse/HIVE-1940
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>            Reporter: Anja Gruenheid
>
> The current basis for cost-based query optimization in Hive is information 
> gathered on tables and partitions. To make further improvements in query 
> optimization possible, the next step is to develop and implement 
> possibilities to gather information on columns as discussed in issue HIVE-33. 
> After that, an implementation of histograms is a possible option to use and 
> collect run-time statistics. Next to the actual implementation of these 
> features, it is also necessary to develop a consistent storage model for the 
> MetaStore.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to