Hello,

We faced a similar problem. Additionally, we had job clients were difficult
to integrate directly with the Thirft API, but needed to resolve file
locations via the metastore. To handle this, we build a cut down service
with a REST API that fronts the Hive metastore. The API is optimised for
this specific case of retrieving lists of file locations for some set of
partitions. A nice thing about this approach is that our users are able to
fetch file lists that are easy to parse, using very simple integrations (a
curl command for example).

Cheers,

Elliot.

On Fri, 8 Nov 2019 at 15:44, <m...@datameer.com> wrote:

> Hi,
> I have a question about how to get the location for a bunch of partitions.
> My answer is: using the hive query `DESCRIBE EXTENDED <tablename>
> PARTITION(<partition_name>)`
>
> I'm getting back a json response (if i set return type to JSON) which has
> the
> HDFS location in it.
>
> But. If I have, lets say 1000 partitions and every query needs 0.5 sec,
> I have to wait 500 sec.
>
> So my question is, do you have a single query to gather all locations?
> Or do you have a workaround to get the locations faster?
> I think about to query the metadatore RDS directly. Similar to
> http://www.openkb.info/2015/04/how-to-list-table-or-partition-location.html
> But in an enterprise environment I'm pretty sure this approach would not be
> the best because the RDS (mysql or derby) is maybe not reachable or
> I don't have the permission to it.
>
> Any other hint or idea how to get all partition locations from lets
> say external table with custom partition locations in a performant way.
>
> Thanks
> Marko
>
>
>

Reply via email to