Richard Yu created KAFKA-9285: --------------------------------- Summary: Implement failed message topic to account for processing lag during failure Key: KAFKA-9285 URL: https://issues.apache.org/jira/browse/KAFKA-9285 Project: Kafka Issue Type: New Feature Components: consumer Reporter: Richard Yu
Presently, in current Kafka failure schematics, when a consumer crashes, the user is typically responsible for both detecting as well as restarting the failed consumer. Therefore, during this period of time, when the consumer is dead, it would result in a period of inactivity where no records are consumed, hence lag results. Previously, there has been attempts to resolve this problem: when failure is detected by broker, a substitute consumer will be started (the so-called [Rebalance Consumer|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-333%3A+Add+faster+mode+of+rebalancing]]) which will continue processing records in Kafka's stead. However, this has complications, as records will only be stored locally, and in case of this consumer failing as well, that data will be lost. Instead, we need to consider how we can still process these records and at the same time effectively _persist_ them. It is here that I propose the concept of a _failed message topic._ At a high level, it works like this. When we find that a consumer has failed, messages which was originally meant to be sent to that consumer would be redirected to this failed messaged topic. The user can choose to assign consumers to this topic, which would consume messages from failed consumers while other consumer threads are down. Naturally, records from different topics can not go into the same failed message topic, since we cannot tell which records belong to which consumer. -- This message was sent by Atlassian Jira (v8.3.4#803005)