[ 
https://issues.apache.org/jira/browse/HIVE-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111480#comment-14111480
 ] 

Ashutosh Chauhan commented on HIVE-7604:
----------------------------------------

Looks good to me. I couldn't understand {{PartitionValuesResponse}} completely. 
Can you add a small description of it in your design doc. 

> Add Metastore API to fetch one or more partition names
> ------------------------------------------------------
>
>                 Key: HIVE-7604
>                 URL: https://issues.apache.org/jira/browse/HIVE-7604
>             Project: Hive
>          Issue Type: New Feature
>          Components: Metastore
>            Reporter: Thiruvel Thirumoolan
>            Assignee: Thiruvel Thirumoolan
>             Fix For: 0.14.0
>
>         Attachments: Design_HIVE_7604.txt
>
>
> We need a new API in Metastore to address the following use cases. Both use 
> cases arise from having tables with hundreds of thousands or in some cases 
> millions of partitions.
> 1. It should be quick and easy to obtain distinct values of a partition. Eg: 
> Obtain all dates for which partitions are available. This can be used by 
> tools/frameworks programmatically to understand gaps in partitions before 
> reprocessing them. Currently one has to run Hive queries (JDBC or CLI) to 
> obtain this information which is unfriendly and heavy weight. And for tables 
> which have large number of partitions, it takes a long time to run the 
> queries and it also requires large heap space.
> 2. Typically users would like to know the list of partitions available and 
> would run queries that would only involve partition keys (select distinct 
> partkey1 from table) Or to obtain the latest date partition from a dimension 
> table to join against another fact table (select * from fact_table join 
> select max(dt) from dimension_table). Those queries (metadata only queries) 
> can be pushed to metastore and need not be run even locally in Hive. If the 
> queries can be converted into database based queries, the clients can be 
> light weight and need not fetch all partition names. The results can be 
> obtained much faster with less resources.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to