[ 
https://issues.apache.org/jira/browse/KAFKA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144625#comment-17144625
 ] 

Sophie Blee-Goldman commented on KAFKA-10198:
---------------------------------------------

[~rhauch] the fix has been merged and picked to the 2.6 branch

> Dirty tasks may be recycled instead of closed
> ---------------------------------------------
>
>                 Key: KAFKA-10198
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10198
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Assignee: Sophie Blee-Goldman
>            Priority: Blocker
>             Fix For: 2.6.0
>
>
> We recently added a guard to `Task#closeClean` to make sure we don't 
> accidentally clean-close a dirty task, but we forgot to also add this check 
> to `Task#closeAndRecycleState`. This meant an otherwise dirty task could be 
> closed clean and recycled into a new task when it should have just been 
> closed.
> This manifest as an NPE in our test application. Specifically, task 1_0 was 
> active on StreamThread-2 but reassigned as a standby. During handleRevocation 
> we hit a TaskMigratedException while flushing the tasks and bailed on trying 
> to flush and commit the remainder. This left task 1_0 with dirty keys in the 
> suppression buffer and the `commitNeeded` flag still set to true.
> During handleAssignment, we should have closed all the tasks with pending 
> state as dirty (ie any task with commitNeeded = true). Since we don't know 
> about the TaskMigratedException we hit during handleRevocation, we rely on 
> the guard in Task#closeClean` to throw an exception and force the task to be 
> closed dirty.
> Unfortunately, we left this guard out of `closeAndRecycleState`, which meant 
> task 1_0 was able to slip through without being closed dirty. Once 
> reinitialized as a standby task, we eventually tried to commit it. The 
> suppression buffer of course tried to flush its remaining dirty keys from its 
> previous life as an active task. But since it's now a standby task, it should 
> not be sending anything to the changelog and has a null RecordCollector. We 
> tried to access it, and hit the NPE.
>  
> The fix is simple, we just need to add the guard in closeClean to 
> closeAndRecycleState as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to