[ https://issues.apache.org/jira/browse/KAFKA-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang resolved KAFKA-9113. ---------------------------------- Fix Version/s: 2.6.0 Resolution: Fixed This clean is done in trunk (2.6.0) now. > Clean up task management > ------------------------ > > Key: KAFKA-9113 > URL: https://issues.apache.org/jira/browse/KAFKA-9113 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 2.4.0 > Reporter: Sophie Blee-Goldman > Assignee: Sophie Blee-Goldman > Priority: Major > Fix For: 2.6.0 > > > Along KIP-429 we did a lot of refactoring of the task management classes, > including the TaskManager and AssignedTasks (and children). While hopefully > easier to reason about there's still significant opportunity for further > cleanup including safer state tracking. Some potential improvements: > 1) Verify that no tasks are ever in more than one state at once. One > possibility is to just check that the suspended, created, restoring, and > running maps are all disjoint, but this begs the question of when and where > to do those checks, and how often. Another idea might be to put all tasks > into a single map and just track their state on a per-task basis. Whatever it > is should be aware that some methods are on the critical code path, and > should not be burdened with excessive safety checks (ie > AssignedStreamTasks#process). Alternatively, it seems to make sense to just > make each state its own type. We can then do some cleanup of the AbstractTask > and StreamTask classes, which currently contain a number of methods specific > to only one type/state of task. For example > * only active running tasks ever need to be suspendable, yet every task does > through suspend then closeSuspended during close. > * as the name suggests, closeSuspended should technically only ever apply to > suspended tasks > * the code paths needed to perform certain actions such as closing or > committing a task vary widely between the different states. A restoring task > need only close its state manager, but skipping the task.close call and > calling only closeStateManager has lead to confusion and time wasted trying > to remember or ask someone why that is sufficient > 2) Cleanup of closing and/or shutdown logic – there are some potential > improvements to be made here as well, for example AssignedTasks currently > implements a closeZombieTask method despite the fact that standby tasks are > never zombies. > 3) The StoreChangelogReader also interacts with (only) the > AssignedStreamsTasks class, through the TaskManager. It can be difficult to > reason about these interactions and the state of the changelog reader. > 4) All 4 classes and their state have very strict consistency requirements > that currently are almost impossible to verify, which has already resulted in > several bugs that we were lucky to catch in time. We should tighten up how > these classes manage their own state, and how the overall state is managed > between them, so that it is easy to make changes without introducing new bugs > because one class updated its own state without knowing it needed to tell > another class to also update its -- This message was sent by Atlassian Jira (v8.3.4#803005)