Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/24052


Change subject: POC IMPALA-14792: Try avoiding Hadoop Path functions during 
Iceberg table loading
......................................................................

POC IMPALA-14792: Try avoiding Hadoop Path functions during Iceberg table 
loading

Quick and dirty solution to speed up IcebergFileMetadataLoader,
I am not sure if it is correct.
When working with a ~1M file Iceberg table, noticed that
incremental loading that loads 0 files (e.g. altering a table
property) takes nearly as much time as a full table load. Based
on some jstacks hadoop.fs.Path functions were identified as the
main CPU consumers. Tried replacing them with quick variants
that assume that the parsed URL as correct and fully qualified.
Note sure what can we expect from paths in Iceberg, but I assume
that their validity should be checked only once at most.

Full load:                  13s->10s
Incremental load (0 files): 9s->3.5s

Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea
---
M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
2 files changed, 75 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/24052/1
--
To view, visit http://gerrit.cloudera.org:8080/24052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea
Gerrit-Change-Number: 24052
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <[email protected]>

Reply via email to