Re: Iterating over partitions using the metastore API

Furcy Pin Thu, 04 Aug 2016 05:16:23 -0700

Hi Elliot,

I guess you can use IMetaStoreClient.listPartitionsNames instead, and then
use IMetaStoreClient.getPartition for each partition.
This might be slow though, as you will have to make 10 000 calls to get
them.

Another option I'd consider is connecting directly to the Hive metastore.
This require a little more configuration (grant read-only access to your
process to the metastore), and might make your implementation dependent
on the metastore underlying implementation (mysql, postgres, derby), unless
you use a ORM to query it.
Anyway, you could ask the metastore directly via JDBC for all the
partitions, and get java.sql.ResultSet that can be iterated over.

Regards,

Furcy

On Thu, Aug 4, 2016 at 1:29 PM, Elliot West <tea...@gmail.com> wrote:

> Hello,
>
> I have a process that needs to iterate over all of the partitions in a
> table using the metastore API.The process should not need to know about the
> structure or meaning of the partition key values (i.e. whether they are
> dates, numbers, country names etc), or be required to know the existing
> range of partition values. Note that the process only needs to know about
> one partition at any given time.
>
> Currently I am naively using the IMetaStoreClient.listPartitions(String,
> String, short) method to retrieve all partitions but clearly this is not
> scalable for tables with many 10,000s of partitions. I'm finding that even
> with relatively large heaps I'm running into OOM exceptions when the
> metastore API is building the List<Partition> return value. I've
> experimented with using IMetaStoreClient.listPartitionSpecs(String,
> String, int) but this too seems to have high memory requirements.
>
> Can anyone suggest how I can better iterate over partitions in a manner
> that is more considerate of memory usage?
>
> Thanks,
>
> Elliot.
>
>

Re: Iterating over partitions using the metastore API

Reply via email to