Any query to a partition column should access the metastore and not the data
----------------------------------------------------------------------------

                 Key: HIVE-2232
                 URL: https://issues.apache.org/jira/browse/HIVE-2232
             Project: Hive
          Issue Type: Improvement
          Components: Metastore, Query Processor, Server Infrastructure
            Reporter: Adam Kramer
            Priority: Minor


The metastore contains all of the data on the possible values, etc., for all 
partition columns (including subpartitions). So, any query that actually reads 
or uses data from partition columns should avoid table scans.

For example:

CREATE TABLE t1 (value1 STRING) PARTITIONED ON (ds STRING, key STRING);
CREATE TABLE t2 (key STRING, value2 STRING) PARTITIONED ON (ds STRING);

...

SELECT t2.key, t1.value1, t2.value2 FROM t1 JOIN t2 ON t1.key=t2.key AND 
t1.ds='2010-01-01' AND t2.ds='2010-01-01';

...ideally, the JOIN in this case would operate very very quickly without 
scanning every row of t1--because every value of t1.key is in the metastore 
because it is a partition column. This is just one example. Partition pruning 
is another example that currently works well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to