[jira] [Work logged] (HIVE-24263) Create an HMS endpoint to list partition locations

ASF GitHub Bot (Jira) Fri, 25 Dec 2020 17:00:37 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24263?focusedWorklogId=528475&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528475
 ]


ASF GitHub Bot logged work on HIVE-24263:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Dec/20 00:59
            Start Date: 26/Dec/20 00:59
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] closed pull request #1572:
URL: https://github.com/apache/hive/pull/1572


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 528475)
    Time Spent: 40m  (was: 0.5h)

> Create an HMS endpoint to list partition locations
> --------------------------------------------------
>
>                 Key: HIVE-24263
>                 URL: https://issues.apache.org/jira/browse/HIVE-24263
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>            Reporter: Szehon Ho
>            Assignee: Szehon Ho
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-24263.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> In our company, we have a use-case to get quickly a list of partition 
> locations.  Currently it is done via listPartitions, which is a very heavy 
> operation in terms of memory and performance.
> This JIRA proposes an API: Map<String, String> listPartitionLocations(String 
> db, String table, short max) that returns a map of partition names to 
> locations.
> For example, we have an integration from output of a Hive pipeline to Spark 
> jobs that consume directly from HDFS.  The Spark job scheduler needs to know 
> the partition paths that are available for consumption (the partition name is 
> not sufficient as it's input is HDFS path), and so we have to do heavy 
> listPartitions() for this.
> Another use-case is for a HDFS data removal tool that does a nightly crawl to 
> see if there are associated hive partitions mapped to a given partition path. 
>  The nightly crawling job could be much less resource-intensive if we had a 
> listPartitionLocations().
> As there is already an internal method in the ObjectStore for this done for 
> dropPartitions, it is only a matter of exposing this API to 
> HiveMetaStoreClient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24263) Create an HMS endpoint to list partition locations

Reply via email to