[
https://issues.apache.org/jira/browse/IMPALA-13737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928683#comment-17928683
]
ASF subversion and git services commented on IMPALA-13737:
----------------------------------------------------------
Commit 37e409059437279c960eba71b3bce69ffbd65f2e in impala's branch
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=37e409059 ]
IMPALA-13737: Directly load file metadata via IcebergFileMetadataLoader
Currently we let HdfsTable to drive file metadata loading of Iceberg
tables. To have better control over file loading, IcebergTable should
use IcebergFileMetadataLoader directly. The underlying HdfsTable can be
empty, which will make it easier to remove this dependency completely.
Also, it solves the de-duplication of file descriptors in Local Catalog
mode.
This patch also clarifies the responsibilities of
IcebergFileMetadataLoader and IcebergContentFileStore. The former
is in charge of loading the file descriptors and decorating them
with Iceberg metadata. The latter is only responsible for grouping
and storing them in an efficient manner.
This patch removes the dependency of IcebergContentFileStore on
FeIcebergTable which will make the REST Catalog implementation
cleaner.
Measurements
(Thanks to Gabor Kaszab for the numbers)
As mentioned above, this patch de-duplicates the file descriptors
in local catalog mode. I.e. it greatly reduces the memory footprint
(IMPALA-11265) in the Coordinator when local catalog is being used.
The measured table had 110k files, 16400 partitions, 1000 manifests,
1000 snapshots. The memory footprint:
Before this patch: 107MB
After this patch: 74MB
Testing:
* no new functionalities added, existing tests should work
Change-Id: Iaf7e23ec21b65036b47edadcb4cbe4b64be3baee
Reviewed-on: http://gerrit.cloudera.org:8080/22458
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Directly load file metadata via IcebergFileMetadataLoader
> ---------------------------------------------------------
>
> Key: IMPALA-13737
> URL: https://issues.apache.org/jira/browse/IMPALA-13737
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
>
> Currently we let HdfsTable to drive file metadata loading. To have better
> control over it IcebergTable should use IcebergFileMetadataLoader directly.
> The underlying HdfsTable can be empty, and it would also solve the
> de-duplication of file descriptors in Local Catalog mode.
> This is another step towards decoupling IcebergTable from HdfsTable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]