ramachandranms commented on a change in pull request #1332: [HUDI -409] Match
header and footer block length to improve corrupted block detection
URL: https://github.com/apache/incubator-hudi/pull/1332#discussion_r380939677
##########
File path:
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
##########
@@ -239,6 +239,15 @@ private boolean isBlockCorrupt(int blocksize) throws
IOException {
return true;
}
+ // check if the blocksize mentioned in the footer is the same as the
header; by seeking back the length of a long
+ // the backward seek does not incur additional IO as {@link
org.apache.hadoop.hdfs.DFSInputStream#seek()}
+ // only moves the index. actual IO happens on the next read operation
+ inputStream.seek(inputStream.getPos() - Long.BYTES);
Review comment:
verified that the tests will fail if the footer is evolved in any way. also
added the following logs to better troubleshoot why a corrupted block was
detected.
```
4828 [main] INFO org.apache.hudi.common.table.log.HoodieLogFileReader -
Found corrupted block in file
HoodieLogFile{pathStr='/var/folders/ch/42h2kyw10_l0509znbppmrsm0000gn/T/junit3752725921138110721/.test-fileid1_100.log.1_1-0-1',
fileLen=0}. No magic hash found right after footer block size entry
4700 [main] INFO org.apache.hudi.common.table.log.HoodieLogFileReader -
Found corrupted block in file
HoodieLogFile{pathStr='/var/folders/ch/42h2kyw10_l0509znbppmrsm0000gn/T/junit8143862081174382297/.test-fileid1_100.log.1_1-0-1',
fileLen=0}. Header block size(2135) did not match the footer block size(2235)
4316 [main] INFO org.apache.hudi.common.table.log.HoodieLogFileReader -
Found corrupted block in file
HoodieLogFile{pathStr='/var/folders/ch/42h2kyw10_l0509znbppmrsm0000gn/T/junit973614944657653272/.test-fileid1_100.log.1_1-0-1',
fileLen=0} with block size(21350) running past EOF```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services