Hastyshell opened a new pull request, #61387:
URL: https://github.com/apache/doris/pull/61387

   ## Proposed changes
   
   Backport of #61386 to `branch-4.0`.
   
   The three-tier retry logic in `Segment::_open()` (static method) was 
structured as `if-else-if`, so when `open_file()` succeeded but 
`_parse_footer()` returned `CORRUPTION`, the retry branch was unreachable.
   
   ### Root cause
   
   ```cpp
   // Before: if-else-if structure
   auto st = fs->open_file(path, &file_reader, &reader_options);
   if (st) {                                  // open_file succeeded (almost 
always)
       segment->_file_reader = ...;
       st = segment->_open(stats);            // _parse_footer() → CORRUPTION 
stored in st
   } else if (st.is<CORRUPTION>() && ...) {   // UNREACHABLE: already entered 
`if` branch
       // Tier 1: clear cache, retry
       // Tier 2: bypass cache, read remote directly
   }
   RETURN_IF_ERROR(st);  // CORRUPTION returned without any retry
   ```
   
   `open_file()` only opens a file handle and rarely returns `CORRUPTION`. The 
actual footer checksum validation happens inside `_parse_footer()` (called via 
`segment->_open()`). Because the retry was in an `else if` guarded by the same 
`st` from `open_file()`, it was never reachable for the common 
`_parse_footer()` corruption case.
   
   ### Fix
   
   Change `else if` to a separate `if` block, so CORRUPTION from either 
`open_file()` or `_parse_footer()` triggers the three-tier retry.
   
   ### Issue
   
   Observed in cloud (S3) deployments (`branch-4.0`): schema change fails with 
`CORRUPTION: Bad segment file footer checksum not match`. Log analysis 
confirmed that no retry log messages were ever emitted, consistent with this 
code bug.
   
   ## Tests
   
   - Added `TestFooterCorruptionTriggersRetry` to `segment_corruption_test.cpp`
   - Uses the existing `Segment::parse_footer:magic_number_corruption` sync 
point to corrupt the footer magic number on the first `_parse_footer()` call 
only
   
   ## Checklist
   
   - [x] Does it affect the original behavior? Yes — previously CORRUPTION from 
`_parse_footer()` was never retried; now it correctly follows the three-tier 
retry
   - [x] Has unit test been added? Yes
   - [x] Is this a backport? Yes, backport of #61386


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to