[ https://issues.apache.org/jira/browse/HIVE-24263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szehon Ho resolved HIVE-24263. ------------------------------ Resolution: Duplicate > Create an HMS endpoint to list partition locations > -------------------------------------------------- > > Key: HIVE-24263 > URL: https://issues.apache.org/jira/browse/HIVE-24263 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore > Reporter: Szehon Ho > Assignee: Szehon Ho > Priority: Major > Labels: pull-request-available > Attachments: HIVE-24263.patch > > Time Spent: 40m > Remaining Estimate: 0h > > In our company, we have a use-case to get quickly a list of partition > locations. Currently it is done via listPartitions, which is a very heavy > operation in terms of memory and performance. > This JIRA proposes an API: Map<String, String> listPartitionLocations(String > db, String table, short max) that returns a map of partition names to > locations. > For example, we have an integration from output of a Hive pipeline to Spark > jobs that consume directly from HDFS. The Spark job scheduler needs to know > the partition paths that are available for consumption (the partition name is > not sufficient as it's input is HDFS path), and so we have to do heavy > listPartitions() for this. > Another use-case is for a HDFS data removal tool that does a nightly crawl to > see if there are associated hive partitions mapped to a given partition path. > The nightly crawling job could be much less resource-intensive if we had a > listPartitionLocations(). > As there is already an internal method in the ObjectStore for this done for > dropPartitions, it is only a matter of exposing this API to > HiveMetaStoreClient. -- This message was sent by Atlassian Jira (v8.3.4#803005)