Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22216 )

Change subject: IMPALA-11265: Part3: Remove redundancy of file descriptors in 
coordinators
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG@7
PS1, Line 7: Part3
> Should we call this Part3 if the current Part3 gets abandoned?
Done


http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG@9
PS1, Line 9: local cat
> local catalog
Done


http://gerrit.cloudera.org:8080/#/c/22216/1//COMMIT_MSG@29
PS1, Line 29: This change
> This change
Done


http://gerrit.cloudera.org:8080/#/c/22216/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/22216/1/common/thrift/CatalogObjects.thrift@652
PS1, Line 652:
> This field seems unnecessary to me. At coordinator side we will have this i
Indeed, removed.


http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@137
PS1, Line 137:
> Probably it's fine as long as fbFileMetadata_ is only for Iceberg tables. T
Ohh, I think I didn't publish the latest version of the code to this review. I 
ended up not doing any changes in this file.


http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@71
PS1, Line 71: leD
> nit: by
Done


http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@74
PS1, Line 74:       return new 
TIcebergFileDesc(decode(this).getFbFileMetadata().getByteBuffer());
> nit: unnecessary empty line (IMO)
I found that at the beginning of functions we usually separate the 
Preconditions checks from the actual function body. I think this way is more 
readable.


http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@185
PS1, Line 185:         }
> Can you please check that we are not double translating the host indexes? F
I didn't manage to get to the bottom of this but apparently without this piece 
of code LocalCatalogTest.testLoadIcebergFileDescriptors fails due to mismatch 
with the host indexes.


http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@398
PS1, Line 398:
> Can't we use FeFsPartition.getLocation()?
You're right! And then we can remove the tableLocation from the thrift too.


http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@623
PS1, Line 623: List<FeFsPartition> parts = hdfsTable_.getPartitions().stream()
             :         .map(p -> (FeFsPartition)p)
             :         .collect(Collectors.toList());
> 1 Iceberg table always contains only 1 FeFsPartition in the current impleme
We could, but then in case we decide to implement the partition-wise breakdown 
of file descriptors for an Iceberg table, this would be one more place to 
change. It's more future proof to assume any number of partitions and do a more 
general implementation.


http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/22216/1/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@708
PS1, Line 708: Use ContentFile path
> Could it have been a relative path earlier? Should we mention in the doc co
It was full path before too. The reason why this change was needed is that for 
our test table we manually remove the "hdfs://localhost:20500" prefix in the 
Iceberg metadata. As a result when the IcebergContentFileStore tries to match 
the 2 parts of the file descriptors there wouldn't be any matches. Hence, I had 
to kind of restore the original full path here for the test tables.
Note, for tables that we create as ususal (not added as a pre-created test 
table) the change in this function will not do anything.



--
To view, visit http://gerrit.cloudera.org:8080/22216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37c245d907baf1e46cb39d046c544bf50b37d581
Gerrit-Change-Number: 22216
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 15 Jan 2025 12:27:27 +0000
Gerrit-HasComments: Yes

Reply via email to