Hello, I have a process that needs to iterate over all of the partitions in a table using the metastore API.The process should not need to know about the structure or meaning of the partition key values (i.e. whether they are dates, numbers, country names etc), or be required to know the existing range of partition values. Note that the process only needs to know about one partition at any given time.
Currently I am naively using the IMetaStoreClient.listPartitions(String, String, short) method to retrieve all partitions but clearly this is not scalable for tables with many 10,000s of partitions. I'm finding that even with relatively large heaps I'm running into OOM exceptions when the metastore API is building the List<Partition> return value. I've experimented with using IMetaStoreClient.listPartitionSpecs(String, String, int) but this too seems to have high memory requirements. Can anyone suggest how I can better iterate over partitions in a manner that is more considerate of memory usage? Thanks, Elliot.