[GitHub] [kafka] gardnervickers commented on pull request #7929: KAFKA-9393: DeleteRecords may cause extreme lock contention for large partition directories

GitBox Mon, 05 Oct 2020 17:55:16 -0700


gardnervickers commented on pull request #7929:
URL: https://github.com/apache/kafka/pull/7929#issuecomment-703966954



   @junrao I was able to get some more time to pick this back up, apologies for 
the long gap in time since my last update.
   
   In the most recent commit, I've moved away from tracking producer state 
snapshot files on the segment and instead maintain an in-memory cache using the 
ProducerStateManager.
   
   There were a few reasons for doing this. Firstly, it became a bit awkward 
trying to track the producer state snapshot file which we emit during clean 
shutdown since it does not really have an associated segment file. Second, the 
producer state truncation/restore logic would need to be moved into the `Log`, 
since complete view of all producer state snapshot files are required. This is 
further complicated by the way we handle corrupt snapshot files. Lastly, 
because we use a "fake" `ProducerStateManager` during segment recovery, we'd 
have to duplicate a lot of the same logic as exists today to handle 
truncation/loading outside of the segment lifecycle.
   
   Instead, I opted to take an approach similar to the way the `Log` manages 
segments, by having a `ConcurrentNavigableMap` which tracks snapshot files in 
memory. As a result, the logic for truncation and restore largely remains the 
same, but instead of scanning the log directory on every operation we query the 
in-memory map instead. Deletions are handled in the same way that segment 
deletions are handled, where the snapshot file is deleted asynchronously along 
with the segment file. Because we scan the logdir at startup for "stray" 
snapshot files, it's unnecessary to rename the snapshot files pending deletion 
with the `.deleted` suffix.
   
   This approach has two downsides which I think are relatively minor.
   
   1. When a broker shuts down cleanly and emits a snapshot file, the emitted 
snapshot file is considered "stray" on the next broker startup. While we will 
clean all "stray" snapshot files except the most recent, we still keep the most 
recent snapshot file around until the next broker restart. This will result in 
a single "stray" snapshot file remaining until the next broker restart, at 
which point the "stray" snapshot file will be deleted.
   2. Because we construct a temporary `ProducerStateManager` during segment 
recovery, and it may delete/create some snapshot files, we need to re-scan the 
log directory for snapshot files after segment loading is completed but before 
we load producer state. This is to ensure that the in-memory mapping for the 
"real" `ProducerStateManager` is up to date.
   
   Snapshot file deletion is triggered via `Log.deleteSegmentFiles` when 
deletion occurs due to retention. When swapping log segments into the log (like 
with compaction), it appears we have the additional limitation that we don't 
want to delete snapshot files purely based on the base offset of the deleted 
segment file. To handle this case, we check to see if the segment file which is 
being deleted due to the swap has a counterpart new segment which is being 
swapped in, if it does, we do not delete the snapshot file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] gardnervickers commented on pull request #7929: KAFKA-9393: DeleteRecords may cause extreme lock contention for large partition directories

Reply via email to