[
https://issues.apache.org/jira/browse/IMPALA-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noémi Pap-Takács updated IMPALA-14564:
--------------------------------------
Description:
File descriptors store the partition information (spec id and partition keys).
Depending on the partitioning, partition keys can consist of many string fields
corresponding to the partition values. Storing these keys redundantly for each
file descriptor object adds a large overhead both to catalogd's memory and to
the serialized data (TIcebergTable.TIcebergContentFileStore) that the Catalog
sends to the Coordinator.
Removing the partition info from file descriptors could significantly reduce
their size.
The partition keys could be stored in a map (id - partition info) that gets
sent along with the file descriptors and the values could be looked up using an
id for each partition.
> Remove redundant partition information from Iceberg file descriptors
> ---------------------------------------------------------------------
>
> Key: IMPALA-14564
> URL: https://issues.apache.org/jira/browse/IMPALA-14564
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog, Frontend
> Reporter: Noémi Pap-Takács
> Assignee: Noémi Pap-Takács
> Priority: Major
> Labels: impala-iceberg
>
> File descriptors store the partition information (spec id and partition
> keys). Depending on the partitioning, partition keys can consist of many
> string fields corresponding to the partition values. Storing these keys
> redundantly for each file descriptor object adds a large overhead both to
> catalogd's memory and to the serialized data
> (TIcebergTable.TIcebergContentFileStore) that the Catalog sends to the
> Coordinator.
> Removing the partition info from file descriptors could significantly reduce
> their size.
> The partition keys could be stored in a map (id - partition info) that gets
> sent along with the file descriptors and the values could be looked up using
> an id for each partition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]