I'm looking at message delivery patterns for Kafka consumers and wanted to get people's thoughts on the following problem:
The objective is to ensure processing of individual messages with as much certainty as possible for "at least once guarantees". I'm looking to have a kafka consumer pull n messages, assuming 100 for arguments sake, process them, commit the offset, then grab 100 more. The issue comes in where you have single message failure. For example message 30 cannot be deserialized, message 40 failed because of some 3rd party service that was down for an instant, etc... So we're looking at having a topic and a topic_retry pattern for consumers so that if there was a single message failure we'd put messages 30 and 40 in the retry topic with a failure count of 1 and if that failure count passes 3 it goes to cold storage for manual analysis. Once we have processed all 100 either by success or making sure they were re-enqueued we commit the offset, then grab more messages. If the percentage of retry topics goes over a threshold trip a circuit breaker for the consumer to stop pulling messages until the issue can be resolved to prevent re-try flooding. What are some patterns around this that people are using currently to handle message failures at scale with kafka? pardon if this is a frequent question but the http://search-hadoop.com/kafka server is down so I can't search historicals at the moment. thanks, Jim