hachikuji commented on pull request #9590:
URL: https://github.com/apache/kafka/pull/9590#issuecomment-790152137
@jolshan In `Log.completeSwapOperations`, we have the following comment:
```
// We create swap files for two cases:
// (1) Log cleaning where multiple segments are merged into one, and
// (2) Log splitting where one segment is split into multiple.
//
// Both of these mean that the resultant swap segments be composed of
the original set, i.e. the swap segment
// must fall within the range of existing segment(s). If we cannot
find such a segment, it means the deletion
// of that segment was successful. In such an event, we should simply
rename the .swap to .log without having to
// do a replace with an existing segment.
```
This is no longer accurate following the change here. As I understand it,
the swap recovery logic looks for each file with the `.swap` extension. It then
tries to find all log segments which have been deleted by the compaction
process by searching for any segments which have an end offset larger than the
swap segment's base offset, but less than the offset of the first record in the
swapped segment. This is handling the case when the broker crashes before it
could delete the old segments.
The recovery algorithm is described in the comment above `replaceSegments`.
I believe the logic still works correctly if we change it so that segments that
are smaller than the first swap segment get removed. This should work because
the swap suffix is removed in descending order, which means that the smallest
swap file is the last one to be updated. I think we will need some test cases
to ensure that the various cases work correctly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]