Greg Fodor created KAFKA-4043: --------------------------------- Summary: User-defined handler for topology restart Key: KAFKA-4043 URL: https://issues.apache.org/jira/browse/KAFKA-4043 Project: Kafka Issue Type: Improvement Components: streams Affects Versions: 0.10.0.1 Reporter: Greg Fodor Assignee: Guozhang Wang
Since Kafka Streams is just a library, there's a lot of cool stuff we've been able to do that would be trickier if it were part of a larger cluster-oriented job execution system that had assumptions about the semantics of a job. One of the jobs we have uses Kafka Streams to do top level data flow, and then one of our processors actually will kick off background threads to do work based upon the data flow state. Happy to fill in more details of our use-case, but fundamentally the model is that we have a Kafka Streams data flow that is reading state from upstream, and that state dictates that work needs to be done, which results in a dedicated work thread to be spawned by our job. This works great, but we're running into an issue when there is partition reassignment, since we have no way to detect this and cleanly shut down these threads. In our case, we'd like to shut down the background worker threads if there is a partition rebalance or if the job raises an exception and attempts to restart. In practice what is happening is we are getting duplicate threads for the same work on a partition rebalance. Implementation-wise, this seems like some type of event handler that can be attached to the topology at build time that can will be called when the data flow needs to rebalance or rebuild its task threads in general (ideally passing as much information about the reason along.) I could imagine this being factored similarly to the KafkaStreams#setUncaughtExceptionHandler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)