Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/23345 )
Change subject: IMPALA-14160: Omit compression, encryption, EC in cache key ...................................................................... Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/23345/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/23345/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2702 PS1, Line 2702: // Ignore file_compression, it does not impact file contents. : scanRange.hdfs_file_split.file_compression = THdfsCompression.NONE; : // Ignore is_encrypted, it does not impact file contents. : scanRange.hdfs_file_split.unsetIs_encrypted(); : // Ignore erasure coding, it does not impact file contents. : scanRange.hdfs_file_split.unsetIs_erasure_coded(); > Are the semantics of light-weight listing explained somewhere? If we can de It's supposed to be an implementation detail of the Ozone client/server communication. It's used when listing status (instead of located status, i.e. listWithLocations flag in our code at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L204). We primarily use status without locations when refreshing existing locations. getFileDescriptor then reuses existing locations if they haven't changed (based on file length or modification time) if (listWithLocations || forceRefreshLocations || fd == null || fd.isChanged(fileStatus)) { fd = createFd(fs, fileStatus, relPath, numUnknownDiskIds); ++loadStats_.loadedFiles; } else { ++loadStats_.skippedFiles; } What happens in this test is: 1. Insert a 2nd file in the table 2. Refreshes table since there's a cached file. That refresh re-uses the fd for the first file (with isEncrypted=true) but uses the status response for the 2nd file (isEncrypted=false). 3. Run select, which uses the refreshed values during planning in the cache key. 4. Invalidate metadata resets tables. 5. Next select uses listWithLocations, so all files show isEncrypted=true. -- To view, visit http://gerrit.cloudera.org:8080/23345 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3595003b9e1fbbd95524b196db002b857acd7870 Gerrit-Change-Number: 23345 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Tue, 26 Aug 2025 16:56:06 +0000 Gerrit-HasComments: Yes
