[ 
https://issues.apache.org/jira/browse/KAFKA-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964468#comment-16964468
 ] 

Guozhang Wang commented on KAFKA-9113:
--------------------------------------

Also related to the above call: we should refactor the {{StreamTask.close}} 
function to be more robust to just close whatever is needed for a restoring 
task, or suspended, etc. More specifically we consider a task be:

1. created: topology not initialized, states not initialized. -> when close, 
basically nothing needs to be done
2. restoring: topology not initialized, states initialized. -> when close, only 
need to close the state manager
3. running: topology initialized, states initialized. -> when close, first 
suspend (close topology), and then close suspended
4. suspended: topology closed, states not closed. -> it is similar to 
restoring: we just call closeSuspended which only need to close the state 
manager

> Clean up task management
> ------------------------
>
>                 Key: KAFKA-9113
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9113
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 2.4.0
>            Reporter: Sophie Blee-Goldman
>            Assignee: Sophie Blee-Goldman
>            Priority: Major
>
> Along KIP-429 we did a lot of refactoring of the task management classes, 
> including the TaskManager and AssignedTasks (and children).  While hopefully 
> easier to reason about there's still significant opportunity for further 
> cleanup including safer state tracking.  Some potential improvements:
> 1) Verify that no tasks are ever in more than one state at once. One 
> possibility is to just check that the suspended, created, restoring, and 
> running maps are all disjoint, but this begs the question of when and where 
> to do those checks, and how often. Another idea might be to put all tasks 
> into a single map and just track their state on a per-task basis. Whatever it 
> is should be aware that some methods are on the critical code path, and 
> should not be burdened with excessive safety checks (ie 
> AssignedStreamTasks#process). Alternatively, it seems to make sense to just 
> make each state its own type. We can then do some cleanup of the AbstractTask 
> and StreamTask classes, which currently contain a number of methods specific 
> to only one type/state of task. For example
>  * only active running tasks ever need to be suspendable, yet every task does 
> through suspend then closeSuspended during close.
>  * as the name suggests, closeSuspended should technically only ever apply to 
> suspended tasks
>  * the code paths needed to perform certain actions such as closing or 
> committing a task vary widely between the different states. A restoring task 
> need only close its state manager, but skipping the task.close call and 
> calling only closeStateManager has lead to confusion and time wasted trying 
> to remember or ask someone why that is sufficient
> 2) Cleanup of closing and/or shutdown logic – there are some potential 
> improvements to be made here as well, for example AssignedTasks currently 
> implements a closeZombieTask method despite the fact that standby tasks are 
> never zombies. 
> 3)  The StoreChangelogReader also interacts with (only) the 
> AssignedStreamsTasks class, through the TaskManager. It can be difficult to 
> reason about these interactions and the state of the changelog reader.
> 4) All 4 classes and their state have very strict consistency requirements 
> that currently are almost impossible to verify, which has already resulted in 
> several bugs that we were lucky to catch in time. We should tighten up how 
> these classes manage their own state, and how the overall state is managed 
> between them, so that it is easy to make changes without introducing new bugs 
> because one class updated its own state without knowing it needed to tell 
> another class to also update its



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to