Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24052
to look at the new patch set (#3).
Change subject: IMPALA-14792: Try avoiding hadoop.fs.Path when loading Iceberg
tables
......................................................................
IMPALA-14792: Try avoiding hadoop.fs.Path when loading Iceberg tables
Quick and dirty solution to speed up IcebergFileMetadataLoader.
Its correctness is based on the assumption that Iceberg file
locations must be normalized.
Noticed in flamegraphs that org.apache.hadoop.fs.Path constructor
is one of the main CPU consumers during Iceberg table loading,
especially incremental reloads when most file descriptors are reused.
hadoop.fs.Path was used to relativize locations compared to base
table location and to get the "path" part of the URI. These can
be done with simple String operations if we can assume that the
URIs are normalized.
Results on 1M file 25K partition Iceberg table:
Full load: 13s->10s
Incremental load (0 files): 9s->3.5s
hadoop.fs.Path constructor still uses significant CPU time after
the change, but mainly in functions that run in parallel, so
its effect is not longer that visible in total execution time.
See Jira for before/after flamegraphs.
Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea
---
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
2 files changed, 49 insertions(+), 20 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/24052/3
--
To view, visit http://gerrit.cloudera.org:8080/24052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea
Gerrit-Change-Number: 24052
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>