If you have access to HCatalog, it also has jdbc connection that would
allow you to get faster response.

On Tue, Nov 12, 2019 at 6:53 AM Elliot West <tea...@gmail.com> wrote:

> Hello,
>
> We faced a similar problem. Additionally, we had job clients were
> difficult to integrate directly with the Thirft API, but needed to resolve
> file locations via the metastore. To handle this, we build a cut down
> service with a REST API that fronts the Hive metastore. The API is
> optimised for this specific case of retrieving lists of file locations for
> some set of partitions. A nice thing about this approach is that our users
> are able to fetch file lists that are easy to parse, using very simple
> integrations (a curl command for example).
>
> Cheers,
>
> Elliot.
>
> On Fri, 8 Nov 2019 at 15:44, <m...@datameer.com> wrote:
>
>> Hi,
>> I have a question about how to get the location for a bunch of partitions.
>> My answer is: using the hive query `DESCRIBE EXTENDED <tablename>
>> PARTITION(<partition_name>)`
>>
>> I'm getting back a json response (if i set return type to JSON) which has
>> the
>> HDFS location in it.
>>
>> But. If I have, lets say 1000 partitions and every query needs 0.5 sec,
>> I have to wait 500 sec.
>>
>> So my question is, do you have a single query to gather all locations?
>> Or do you have a workaround to get the locations faster?
>> I think about to query the metadatore RDS directly. Similar to
>>
>> http://www.openkb.info/2015/04/how-to-list-table-or-partition-location.html
>> But in an enterprise environment I'm pretty sure this approach would not
>> be
>> the best because the RDS (mysql or derby) is maybe not reachable or
>> I don't have the permission to it.
>>
>> Any other hint or idea how to get all partition locations from lets
>> say external table with custom partition locations in a performant way.
>>
>> Thanks
>> Marko
>>
>>
>>

Reply via email to