Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/22216 )
Change subject: IMPALA-11265: Part3: Remove redundancy of file descriptors in coordinators ...................................................................... Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG@7 PS1, Line 7: Part3 > Should we call this Part3 if the current Part3 gets abandoned? Done http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG@9 PS1, Line 9: local cat > local catalog Done http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG@29 PS1, Line 29: This change > This change Done http://gerrit.cloudera.org:8080/#/c/22216/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/22216/1/common/thrift/CatalogObjects.thrift@652 PS1, Line 652: > This field seems unnecessary to me. At coordinator side we will have this i Indeed, removed. http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@137 PS1, Line 137: > Probably it's fine as long as fbFileMetadata_ is only for Iceberg tables. T Ohh, I think I didn't publish the latest version of the code to this review. I ended up not doing any changes in this file. http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@71 PS1, Line 71: leD > nit: by Done http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@74 PS1, Line 74: return new TIcebergFileDesc(decode(this).getFbFileMetadata().getByteBuffer()); > nit: unnecessary empty line (IMO) I found that at the beginning of functions we usually separate the Preconditions checks from the actual function body. I think this way is more readable. http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@185 PS1, Line 185: } > Can you please check that we are not double translating the host indexes? F I didn't manage to get to the bottom of this but apparently without this piece of code LocalCatalogTest.testLoadIcebergFileDescriptors fails due to mismatch with the host indexes. http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@398 PS1, Line 398: > Can't we use FeFsPartition.getLocation()? You're right! And then we can remove the tableLocation from the thrift too. http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@623 PS1, Line 623: List<FeFsPartition> parts = hdfsTable_.getPartitions().stream() : .map(p -> (FeFsPartition)p) : .collect(Collectors.toList()); > 1 Iceberg table always contains only 1 FeFsPartition in the current impleme We could, but then in case we decide to implement the partition-wise breakdown of file descriptors for an Iceberg table, this would be one more place to change. It's more future proof to assume any number of partitions and do a more general implementation. http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@708 PS1, Line 708: Use ContentFile path > Could it have been a relative path earlier? Should we mention in the doc co It was full path before too. The reason why this change was needed is that for our test table we manually remove the "hdfs://localhost:20500" prefix in the Iceberg metadata. As a result when the IcebergContentFileStore tries to match the 2 parts of the file descriptors there wouldn't be any matches. Hence, I had to kind of restore the original full path here for the test tables. Note, for tables that we create as ususal (not added as a pre-created test table) the change in this function will not do anything. -- To view, visit http://gerrit.cloudera.org:8080/22216 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37c245d907baf1e46cb39d046c544bf50b37d581 Gerrit-Change-Number: 22216 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 15 Jan 2025 12:27:27 +0000 Gerrit-HasComments: Yes
