[ https://issues.apache.org/jira/browse/KAFKA-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kirk True reassigned KAFKA-19012: --------------------------------- Assignee: Kirk True > Messages ending up on the wrong topic > ------------------------------------- > > Key: KAFKA-19012 > URL: https://issues.apache.org/jira/browse/KAFKA-19012 > Project: Kafka > Issue Type: Bug > Components: clients, producer > Affects Versions: 3.2.3, 3.8.1 > Reporter: Donny Nadolny > Assignee: Kirk True > Priority: Major > > We're experiencing messages very occasionally ending up on a different topic > than what they were published to. That is, we publish a message to topicA and > consumers of topicB see it and fail to parse it because the message contents > are meant for topicA. This has happened for various topics. > We've begun adding a header with the intended topic (which we get just by > reading the topic from the record that we're about to pass to the OSS client) > right before we call producer.send, this header shows the correct topic > (which also matches up with the message contents itself). Similarly we're > able to use this header and compare it to the actual topic to prevent > consuming these misrouted messages, but this is still concerning. > Some details: > - This happens rarely: it happened approximately once per 10 trillion > messages for a few months, though there was a period of a week or so where it > happened more frequently (once per 1 trillion messages or so) > - It often happens in a small burst, eg 2 or 3 messages very close in time > (but from different hosts) will be misrouted > - It often but not always coincides with some sort of event in the cluster > (a broker restarting or being replaced, network issues causing errors, etc). > Also these cluster events happen quite often with no misrouted messages > - We run many clusters, it has happened for several of them > - There is no pattern between intended and actual topic, other than the > intended topic tends to be higher volume ones (but I'd attribute that to > there being more messages published -> more occurrences affecting it rather > than it being more likely per-message) > - It only occurs with clients that are using a non-zero linger > - Once it happened with two sequential messages, both were intended for > topicA but both ended up on topicB, published by the same host (presumably > within the same linger batch) > - Most of our clients are 3.2.3 and it has only affected those, most of our > brokers are 3.2.3 but it has also happened with a cluster that's running > 3.8.1 (but I suspect a client rather than broker problem because of it never > happening with clients that use 0 linger) -- This message was sent by Atlassian Jira (v8.20.10#820010)