Todd Lipcon created HIVE-19715:
----------------------------------

             Summary: Consolidated and flexible API for fetching partition 
metadata from HMS
                 Key: HIVE-19715
                 URL: https://issues.apache.org/jira/browse/HIVE-19715
             Project: Hive
          Issue Type: New Feature
          Components: Standalone Metastore
            Reporter: Todd Lipcon


Currently, the HMS thrift API exposes 17 different APIs for fetching 
partition-related information. There is somewhat of a combinatorial explosion 
going on, where each API has variants with and without "auth" info, by pspecs 
vs names, by filters, by exprs, etc. Having all of these separate APIs long 
term is a maintenance burden and also more confusing for consumers.

Additionally, even with all of these APIs, there is a lack of granularity in 
fetching only the information needed for a particular use case. For example, in 
some use cases it may be beneficial to only fetch the partition locations 
without wasting effort fetching statistics, etc.

This JIRA proposes that we add a new "one API to rule them all" for fetching 
partition info. The request and response would be encapsulated in structs. Some 
desirable properties:
- the request should be able to specify which pieces of information are 
required (eg location, properties, etc)
- in the case of partition parameters, the request should be able to do either 
whitelisting or blacklisting (eg to exclude large incremental column stats HLL 
dumped in there by Impala)
- the request should optionally specify auth info (to encompas the "with_auth" 
variants)
- the request should be able to designate the set of partitions to access 
through one of several different methods (eg "all", list<name>, expr, 
part_vals, etc) 
- the struct should be easily evolvable so that new pieces of info can be added
- the response should be designed in such a way as to avoid transferring 
redundant information for common cases (eg simple "dictionary coding" of 
strings like parameter names, etc)
- the API should support some form of pagination for tables with large 
partition counts




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to