Quanlong Huang created HIVE-26893:
-------------------------------------

             Summary: Extend get partitions APIs to ignore partition schemas
                 Key: HIVE-26893
                 URL: https://issues.apache.org/jira/browse/HIVE-26893
             Project: Hive
          Issue Type: New Feature
          Components: Metastore
            Reporter: Quanlong Huang


There are several HMS APIs that return a list of partitions, e.g. 
get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with 
needResult=true, etc. Each partition instance will have a unique list of 
FieldSchemas as the partition schema:
{code:java}
org.apache.hadoop.hive.metastore.api.Partition
-> org.apache.hadoop.hive.metastore.api.StorageDescriptor
   ->  cols: list<org.apache.hadoop.hive.metastore.api.FieldSchema> {code}
This could occupy a large memory footprint for wide tables (e.g. with 2k cols). 
See the heap histogram in IMPALA-11812 as an example.

Some engines like Impala doesn't actually use/respect the partition level 
schema. It's a waste of network/serde resource to transmit them. It'd be nice 
if these APIs provide an optional boolean flag for ignoring partition schemas. 
So HMS clients (e.g. Impala) don't need to clear them later (to save mem).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to