If you have access to HCatalog, it also has jdbc connection that would allow you to get faster response.
On Tue, Nov 12, 2019 at 6:53 AM Elliot West <tea...@gmail.com> wrote: > Hello, > > We faced a similar problem. Additionally, we had job clients were > difficult to integrate directly with the Thirft API, but needed to resolve > file locations via the metastore. To handle this, we build a cut down > service with a REST API that fronts the Hive metastore. The API is > optimised for this specific case of retrieving lists of file locations for > some set of partitions. A nice thing about this approach is that our users > are able to fetch file lists that are easy to parse, using very simple > integrations (a curl command for example). > > Cheers, > > Elliot. > > On Fri, 8 Nov 2019 at 15:44, <m...@datameer.com> wrote: > >> Hi, >> I have a question about how to get the location for a bunch of partitions. >> My answer is: using the hive query `DESCRIBE EXTENDED <tablename> >> PARTITION(<partition_name>)` >> >> I'm getting back a json response (if i set return type to JSON) which has >> the >> HDFS location in it. >> >> But. If I have, lets say 1000 partitions and every query needs 0.5 sec, >> I have to wait 500 sec. >> >> So my question is, do you have a single query to gather all locations? >> Or do you have a workaround to get the locations faster? >> I think about to query the metadatore RDS directly. Similar to >> >> http://www.openkb.info/2015/04/how-to-list-table-or-partition-location.html >> But in an enterprise environment I'm pretty sure this approach would not >> be >> the best because the RDS (mysql or derby) is maybe not reachable or >> I don't have the permission to it. >> >> Any other hint or idea how to get all partition locations from lets >> say external table with custom partition locations in a performant way. >> >> Thanks >> Marko >> >> >>