subject:"Iterating over partitions using the metastore API"

Re: Iterating over partitions using the metastore API

2016-08-04 Thread Elliot West

Thanks for your reply. I hadn't considered driving it from a list of partition names. To avoid the N+1 reads I am considering reading in batches like so: - Sorting the names - Taking every nth name (where n is the batch size) to use a a batch boundary. - Building a filter derived from

Re: Iterating over partitions using the metastore API

2016-08-04 Thread Furcy Pin

Hi Elliot, I guess you can use IMetaStoreClient.listPartitionsNames instead, and then use IMetaStoreClient.getPartition for each partition. This might be slow though, as you will have to make 10 000 calls to get them. Another option I'd consider is connecting directly to the Hive metastore. This

Iterating over partitions using the metastore API

2016-08-04 Thread Elliot West

Hello, I have a process that needs to iterate over all of the partitions in a table using the metastore API.The process should not need to know about the structure or meaning of the partition key values (i.e. whether they are dates, numbers, country names etc), or be required to know the existing