[ 
https://issues.apache.org/jira/browse/KAFKA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628093#comment-16628093
 ] 

ASF GitHub Bot commented on KAFKA-7322:
---------------------------------------

xiowu0 opened a new pull request #5694: KAFKA-7322; This is a followup Fix to 
previous patch.
URL: https://github.com/apache/kafka/pull/5694
 
 
   KAFKA-7322; This is a followup Fix. With previous fix, log retention thread 
can throw illegal state exception when race against topic deletion or 
truncation.
   
   The following race can happen: 1) log retention set a topic partition to 
paused state 2) topic deletion come and see it is already in paused state and 
proceed 3) topic deletion removed the paused state 4)log retention tries to 
resume the same topic partition from a NONE state and throw out an exception.
   
    In order to fix this situation, we allow a topic partition to be paused 
multiple times.  In addition, a concurrent unit test is added to test race 
condition.
   
   
   *Summary of testing strategy (including rationale)
   Special cooked high concurrent unit test.
   Ingesting latency to make the race condition happen in broker, and observed 
both the race happen and the current implementation handled it correctly.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Fix race condition between log cleaner thread and log retention thread when 
> topic cleanup policy is updated
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7322
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7322
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>            Reporter: xiongqi wu
>            Assignee: xiongqi wu
>            Priority: Major
>             Fix For: 2.1.0
>
>
> The deletion thread will grab the log.lock when it tries to rename log 
> segment and schedule for actual deletion.
> The compaction thread only grabs the log.lock when it tries to replace the 
> original segments with the cleaned segment. The compaction thread doesn't 
> grab the log when it reads records from the original segments to build 
> offsetmap and new segments. As a result, if both deletion and compaction 
> threads work on the same log partition. We have a race condition. 
> This race happens when the topic cleanup policy is updated on the fly.  
> One case to hit this race condition:
> 1: topic clean up policy is "compact" initially 
> 2: log cleaner (compaction) thread picks up the partition for compaction and 
> still in progress
> 3: the topic clean up policy has been updated to "deletion"
> 4: retention thread pick up the topic partition and delete some old segments.
> 5: log cleaner thread reads from the deleted log and raise an IO exception. 
>  
> The proposed solution is to use "inprogress" map that cleaner manager has to 
> protect such a race.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to