We're experiencing messages very occasionally ending up on a different
topic than what they were published to. That is, we publish a message to
topicA and consumers of topicB see it and fail to parse it because the
message contents are meant for topicA. This has happened for various
topics. Searching existing bug reports hasn't shown anything, has anyone
seen anything like this?

We've begun adding a header with the intended topic (which we get just by
reading the topic from the record that we're about to pass to the OSS
client) right before we call producer.send, this header shows the correct
topic (which also matches up with the message contents itself). Similarly
we're able to use this header and compare it to the actual topic to prevent
consuming these misrouted messages, but it causes work for us to replay
these messages to the right topic and is also pretty concerning.

Some details:
 - This happens rarely: approximately once per 10 trillion messages
 - It often happens in a small burst, eg 2 or 3 messages very close in time
(but from different hosts) will be misrouted
 - It often but not always coincides with some sort of event in the cluster
(a broker restarting or being replaced, network issues causing errors,
etc). Also these cluster events happen quite often with no misrouted
messages
 - We run many clusters, it has happened for several of them
 - There is no pattern between intended and actual topic, other than the
intended topic tends to be higher volume ones (but I'd attribute that to
there being more messages published -> more occurrences affecting it rather
than it being more likely per-message)
 - It only occurs with clients that are using a non-zero linger
 - Once it happened with two sequential messages, both were intended for
topicA but both ended up on topicB, published by the same host (presumably
within the same linger batch)
 - Most of our clients are 3.2.3 and it has only affected those, our
brokers are 3.2.3 as well (but I suspect a client rather than broker
problem because of it never happening with clients that use 0 linger)

Thanks,
Donny

Reply via email to