[ https://issues.apache.org/jira/browse/KAFKA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144472#comment-17144472 ]
Randall Hauch commented on KAFKA-10198: --------------------------------------- Thanks, [~ableegoldman]. I agree this should be fixed in 2.6, so merge whenever the PR is ready and cherry-pick to the `2.6` branch. (I'm the 2.6 release manager.) > Dirty tasks may be recycled instead of closed > --------------------------------------------- > > Key: KAFKA-10198 > URL: https://issues.apache.org/jira/browse/KAFKA-10198 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Sophie Blee-Goldman > Assignee: Sophie Blee-Goldman > Priority: Blocker > Fix For: 2.6.0 > > > We recently added a guard to `Task#closeClean` to make sure we don't > accidentally clean-close a dirty task, but we forgot to also add this check > to `Task#closeAndRecycleState`. This meant an otherwise dirty task could be > closed clean and recycled into a new task when it should have just been > closed. > This manifest as an NPE in our test application. Specifically, task 1_0 was > active on StreamThread-2 but reassigned as a standby. During handleRevocation > we hit a TaskMigratedException while flushing the tasks and bailed on trying > to flush and commit the remainder. This left task 1_0 with dirty keys in the > suppression buffer and the `commitNeeded` flag still set to true. > During handleAssignment, we should have closed all the tasks with pending > state as dirty (ie any task with commitNeeded = true). Since we don't know > about the TaskMigratedException we hit during handleRevocation, we rely on > the guard in Task#closeClean` to throw an exception and force the task to be > closed dirty. > Unfortunately, we left this guard out of `closeAndRecycleState`, which meant > task 1_0 was able to slip through without being closed dirty. Once > reinitialized as a standby task, we eventually tried to commit it. The > suppression buffer of course tried to flush its remaining dirty keys from its > previous life as an active task. But since it's now a standby task, it should > not be sending anything to the changelog and has a null RecordCollector. We > tried to access it, and hit the NPE. > > The fix is simple, we just need to add the guard in closeClean to > closeAndRecycleState as well -- This message was sent by Atlassian Jira (v8.3.4#803005)