Thanks for your reply. I hadn't considered driving it from a list of
partition names.
To avoid the N+1 reads I am considering reading in batches like so:
- Sorting the names
- Taking every nth name (where n is the batch size) to use a a batch
boundary.
- Building a filter derived from
Hi Elliot,
I guess you can use IMetaStoreClient.listPartitionsNames instead, and then
use IMetaStoreClient.getPartition for each partition.
This might be slow though, as you will have to make 10 000 calls to get
them.
Another option I'd consider is connecting directly to the Hive metastore.
This
Hello,
I have a process that needs to iterate over all of the partitions in a
table using the metastore API.The process should not need to know about the
structure or meaning of the partition key values (i.e. whether they are
dates, numbers, country names etc), or be required to know the existing