Thiruvel Thirumoolan created HIVE-7604:
------------------------------------------

             Summary: Add Metastore API to fetch one or more partition names
                 Key: HIVE-7604
                 URL: https://issues.apache.org/jira/browse/HIVE-7604
             Project: Hive
          Issue Type: New Feature
          Components: Metastore
            Reporter: Thiruvel Thirumoolan
            Assignee: Thiruvel Thirumoolan
             Fix For: 0.14.0


We need a new API in Metastore to address the following use cases. Both use 
cases arise from having tables with hundreds of thousands or in some cases 
millions of partitions.

1. It should be quick and easy to obtain distinct values of a partition. Eg: 
Obtain all dates for which partitions are available. This can be used by 
tools/frameworks programmatically to understand gaps in partitions before 
reprocessing them. Currently one has to run Hive queries (JDBC or CLI) to 
obtain this information which is unfriendly and heavy weight. And for tables 
which have large number of partitions, it takes a long time to run the queries 
and it also requires large heap space.

2. Typically users would like to know the list of partitions available and 
would run queries that would only involve partition keys (select distinct 
partkey1 from table) Or to obtain the latest date partition from a dimension 
table to join against another fact table (select * from fact_table join select 
max(dt) from dimension_table). Those queries (metadata only queries) can be 
pushed to metastore and need not be run even locally in Hive. If the queries 
can be converted into database based queries, the clients can be light weight 
and need not fetch all partition names. The results can be obtained much faster 
with less resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to