Hello,

I have a process that needs to iterate over all of the partitions in a
table using the metastore API.The process should not need to know about the
structure or meaning of the partition key values (i.e. whether they are
dates, numbers, country names etc), or be required to know the existing
range of partition values. Note that the process only needs to know about
one partition at any given time.

Currently I am naively using the IMetaStoreClient.listPartitions(String,
String, short) method to retrieve all partitions but clearly this is not
scalable for tables with many 10,000s of partitions. I'm finding that even
with relatively large heaps I'm running into OOM exceptions when the
metastore API is building the List<Partition> return value. I've
experimented with using IMetaStoreClient.listPartitionSpecs(String, String,
int) but this too seems to have high memory requirements.

Can anyone suggest how I can better iterate over partitions in a manner
that is more considerate of memory usage?

Thanks,

Elliot.

Reply via email to