Quanlong Huang created HIVE-26893: ------------------------------------- Summary: Extend get partitions APIs to ignore partition schemas Key: HIVE-26893 URL: https://issues.apache.org/jira/browse/HIVE-26893 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Quanlong Huang
There are several HMS APIs that return a list of partitions, e.g. get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with needResult=true, etc. Each partition instance will have a unique list of FieldSchemas as the partition schema: {code:java} org.apache.hadoop.hive.metastore.api.Partition -> org.apache.hadoop.hive.metastore.api.StorageDescriptor -> cols: list<org.apache.hadoop.hive.metastore.api.FieldSchema> {code} This could occupy a large memory footprint for wide tables (e.g. with 2k cols). See the heap histogram in IMPALA-11812 as an example. Some engines like Impala doesn't actually use/respect the partition level schema. It's a waste of network/serde resource to transmit them. It'd be nice if these APIs provide an optional boolean flag for ignoring partition schemas. So HMS clients (e.g. Impala) don't need to clear them later (to save mem). -- This message was sent by Atlassian Jira (v8.20.10#820010)