dhruvilshah3 opened a new pull request #10388: URL: https://github.com/apache/kafka/pull/10388
When we find a `.swap` file on startup, we typically want to rename and replace it as `.log`, `.index`, `.timeindex`, etc. as a way to complete any ongoing replace operations. These swap files are usually known to have been flushed to disk before the replace operation begins. One flaw in the current logic is that we recover these swap files on startup and as part of that, end up truncating the producer state and rebuild it from scratch. This is unneeded as the replace operation does not mutate the producer state by itself. It is only meant to replace the `.log` file along with corresponding indices. Because of this unneeded producer state rebuild operation, we have seen multi-hour startup times for clusters that have large compacted topics. This patch fixes the issue by doing a sanity check of all records in the segment to swap and rebuilds corresponding indices without mutating the producer state. Similarly, we also rebuild indices without truncating the producer state when we find a missing or corrupted index in the middle of the log. The patch also adds an extra sanity check to detect invalid bytes at the end of swap segments. Before this patch, we would truncate invalid bytes from the swap segment which could leave us with holes in the log. Because this is an unexpected scenario, we now raise an exception in such cases which will fail the broker on startup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org