Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/24052
Change subject: POC IMPALA-14792: Try avoiding Hadoop Path functions during Iceberg table loading ...................................................................... POC IMPALA-14792: Try avoiding Hadoop Path functions during Iceberg table loading Quick and dirty solution to speed up IcebergFileMetadataLoader, I am not sure if it is correct. When working with a ~1M file Iceberg table, noticed that incremental loading that loads 0 files (e.g. altering a table property) takes nearly as much time as a full table load. Based on some jstacks hadoop.fs.Path functions were identified as the main CPU consumers. Tried replacing them with quick variants that assume that the parsed URL as correct and fully qualified. Note sure what can we expect from paths in Iceberg, but I assume that their validity should be checked only once at most. Full load: 13s->10s Incremental load (0 files): 9s->3.5s Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea --- M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java 2 files changed, 75 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/24052/1 -- To view, visit http://gerrit.cloudera.org:8080/24052 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idce89117195e0fa64fdd6a6c576bce09ec2e75ea Gerrit-Change-Number: 24052 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer <[email protected]>
