I'm seeing consumers miss messages when they subscribe before the topic is
actually created.

Scenario:
1) kafka 0.10.1.1 cluster with allow-topic no topics, but supports topic
auto-creation as soon as a message is published to the topic
2) consumer subscribes using topic string or a regex pattern. Currently no
topics match. Consumer offset is "latest"
3) producer publishes to a topic that matches the string or regex pattern.
4) broker immediately creates a topic, writes the message, and also
notifies the consumer group that a rebalance needs to happen to assign the
topic_partition to one of the consumers..
5) rebalance is fairly quick, maybe a second or so
6) a consumer is assigned to the newly-created topic_partition

At this point, we've got a consumer steadily polling the recently created
topic_partition. However, the consumer.poll() never returns any messages
published between topic creation and when the consumer was assigned to the
topic_partition. I'm guessing this may be because when the consumer is
assigned to the topic_partition it doesn't find any, so it uses the latest
offset, which happens to be after the messages that were published to
create the topic.

This is surprising because the consumer technically was subscribed to the
topic before the messages were produced, so you'd think the consumer would
receive these messages.

Is this known behavior? A bug in Kafka broker? Or a bug in my client
library?

Reply via email to