[ 
https://issues.apache.org/jira/browse/HIVE-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-7604:
---------------------------------------

    Attachment: Design_HIVE_7604.1.txt

Thanks [~ashutoshc], uploading revised document with additional information for 
return values. Lemme know if its unclear.

> Add Metastore API to fetch one or more partition names
> ------------------------------------------------------
>
>                 Key: HIVE-7604
>                 URL: https://issues.apache.org/jira/browse/HIVE-7604
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>            Reporter: Thiruvel Thirumoolan
>            Assignee: Thiruvel Thirumoolan
>             Fix For: 0.14.0
>
>         Attachments: Design_HIVE_7604.1.txt, Design_HIVE_7604.txt
>
>
> We need a new API in Metastore to address the following use cases. Both use 
> cases arise from having tables with hundreds of thousands or in some cases 
> millions of partitions.
> 1. It should be quick and easy to obtain distinct values of a partition. Eg: 
> Obtain all dates for which partitions are available. This can be used by 
> tools/frameworks programmatically to understand gaps in partitions before 
> reprocessing them. Currently one has to run Hive queries (JDBC or CLI) to 
> obtain this information which is unfriendly and heavy weight. And for tables 
> which have large number of partitions, it takes a long time to run the 
> queries and it also requires large heap space.
> 2. Typically users would like to know the list of partitions available and 
> would run queries that would only involve partition keys (select distinct 
> partkey1 from table) Or to obtain the latest date partition from a dimension 
> table to join against another fact table (select * from fact_table join 
> select max(dt) from dimension_table). Those queries (metadata only queries) 
> can be pushed to metastore and need not be run even locally in Hive. If the 
> queries can be converted into database based queries, the clients can be 
> light weight and need not fetch all partition names. The results can be 
> obtained much faster with less resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to