[
https://issues.apache.org/jira/browse/HIVE-24263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on HIVE-24263 started by Szehon Ho.
----------------------------------------
> Create an HMS endpoint to list partition locations
> --------------------------------------------------
>
> Key: HIVE-24263
> URL: https://issues.apache.org/jira/browse/HIVE-24263
> Project: Hive
> Issue Type: Improvement
> Components: Standalone Metastore
> Reporter: Szehon Ho
> Assignee: Szehon Ho
> Priority: Major
> Labels: pull-request-available
> Attachments: HIVE-24263.patch
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> In our company, we have a use-case to get quickly a list of partition
> locations. Currently it is done via listPartitions, which is a very heavy
> operation in terms of memory and performance.
> This JIRA proposes an API: Map<String, String> listPartitionLocations(String
> db, String table, short max) that returns a map of partition names to
> locations.
> For example, we have an integration from output of a Hive pipeline to Spark
> jobs that consume directly from HDFS. The Spark job scheduler needs to know
> the partition paths that are available for consumption (the partition name is
> not sufficient as it's input is HDFS path), and so we have to do heavy
> listPartitions() for this.
> Another use-case is for a HDFS data removal tool that does a nightly crawl to
> see if there are associated hive partitions mapped to a given partition path.
> The nightly crawling job could be much less resource-intensive if we had a
> listPartitionLocations().
> As there is already an internal method in the ObjectStore for this done for
> dropPartitions, it is only a matter of exposing this API to
> HiveMetaStoreClient.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)