Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23345 )

Change subject: IMPALA-14160: Omit compression, encryption, EC in cache key
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/23345/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/23345/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2702
PS1, Line 2702:             // Ignore file_compression, it does not impact file 
contents.
              :             scanRange.hdfs_file_split.file_compression = 
THdfsCompression.NONE;
              :             // Ignore is_encrypted, it does not impact file 
contents.
              :             scanRange.hdfs_file_split.unsetIs_encrypted();
              :             // Ignore erasure coding, it does not impact file 
contents.
              :             scanRange.hdfs_file_split.unsetIs_erasure_coded();
> Are the semantics of light-weight listing explained somewhere? If we can de
It's supposed to be an implementation detail of the Ozone client/server 
communication. It's used when listing status (instead of located status, i.e. 
listWithLocations flag in our code at 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L204).

We primarily use status without locations when refreshing existing locations. 
getFileDescriptor then reuses existing locations if they haven't changed (based 
on file length or modification time)

    if (listWithLocations || forceRefreshLocations || fd == null ||
        fd.isChanged(fileStatus)) {
      fd = createFd(fs, fileStatus, relPath, numUnknownDiskIds);
      ++loadStats_.loadedFiles;
    } else {
      ++loadStats_.skippedFiles;
    }

What happens in this test is:
1. Insert a 2nd file in the table
2. Refreshes table since there's a cached file. That refresh re-uses the fd for 
the first file (with isEncrypted=true) but uses the status response for the 2nd 
file (isEncrypted=false).
3. Run select, which uses the refreshed values during planning in the cache key.
4. Invalidate metadata resets tables.
5. Next select uses listWithLocations, so all files show isEncrypted=true.



--
To view, visit http://gerrit.cloudera.org:8080/23345
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3595003b9e1fbbd95524b196db002b857acd7870
Gerrit-Change-Number: 23345
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Tue, 26 Aug 2025 16:56:06 +0000
Gerrit-HasComments: Yes

Reply via email to