[ https://issues.apache.org/jira/browse/HIVE-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thiruvel Thirumoolan updated HIVE-7604: --------------------------------------- Attachment: Design_HIVE_7604.1.txt Thanks [~ashutoshc], uploading revised document with additional information for return values. Lemme know if its unclear. > Add Metastore API to fetch one or more partition names > ------------------------------------------------------ > > Key: HIVE-7604 > URL: https://issues.apache.org/jira/browse/HIVE-7604 > Project: Hive > Issue Type: New Feature > Components: Metastore > Reporter: Thiruvel Thirumoolan > Assignee: Thiruvel Thirumoolan > Fix For: 0.14.0 > > Attachments: Design_HIVE_7604.1.txt, Design_HIVE_7604.txt > > > We need a new API in Metastore to address the following use cases. Both use > cases arise from having tables with hundreds of thousands or in some cases > millions of partitions. > 1. It should be quick and easy to obtain distinct values of a partition. Eg: > Obtain all dates for which partitions are available. This can be used by > tools/frameworks programmatically to understand gaps in partitions before > reprocessing them. Currently one has to run Hive queries (JDBC or CLI) to > obtain this information which is unfriendly and heavy weight. And for tables > which have large number of partitions, it takes a long time to run the > queries and it also requires large heap space. > 2. Typically users would like to know the list of partitions available and > would run queries that would only involve partition keys (select distinct > partkey1 from table) Or to obtain the latest date partition from a dimension > table to join against another fact table (select * from fact_table join > select max(dt) from dimension_table). Those queries (metadata only queries) > can be pushed to metastore and need not be run even locally in Hive. If the > queries can be converted into database based queries, the clients can be > light weight and need not fetch all partition names. The results can be > obtained much faster with less resources. -- This message was sent by Atlassian JIRA (v6.2#6252)