Hi Edvard, interesting problem – I’ve had similar problems with high fan out use cases, but only for demo applications where I’m more interested in scale than order – e.g. have a look at this list of blogs, examples include Anomalia Machina for Kafka+Cassandra, and Kongo, Kafka+IoT. https://www.linkedin.com/pulse/complete-guide-apache-kafka-developers-everything-i-know-paul-brebner/
Regards, Paul From: Edvard Fagerholm <edvard.fagerh...@gmail.com> Date: Tuesday, 30 May 2023 at 6:55 am To: users@kafka.apache.org <users@kafka.apache.org> Subject: Patterns for generating ordered streams NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi there, Kafka is new to us, so we don't have operational experience from it. I'm building a system that would be broadcasting events to groups of clients. Kafka consumers handle the broadcasting. In other words, kafka topic --- consumers --- clients We need to maintain the order of the events as they come through Kafka and clients need to be able to retrieve from a DB any events posted while they were offline. This means that the consumer also writes each event to a DB. The number of clients is in the hundreds of millions. What I'm trying to understand here is what I can use to sort the events in the DB. A natural candidate would be to include the Kafka partition offset as a range key in the DB and when doing queries use it for sorting as it would guarantee the same order. Timestamps from the consumer I would not use, since they aren't idempotent and a consumer could fail before acking its last batch. A problem with the partition offset that I haven't been able to find an answer to in the docs is operational. If we would ever like to move a topic to a new cluster, then the partitions of the topic in the new cluster need to start their offsets where they ended in the old cluster. Is it possible to set initial offsets for each partition when creating a topic? Have other users solved this in other ways? Another idea I've had is to just keep a sequence number in memory on the consumer and then store offset -> sequence number mappings in the DB. When committing the consumer would delete the maps from older batches. This allows a new consumer to synchronize with any previous consumer and maintain idempotency. I'm assuming this is a topic that has concerned many Kafka users. What we can't do is maintain a sequence number in a DB and update that on each event. The reason is that we use Cassandra for ingesting the events and it does not support a counter that can be incremented and read without transactions, so it would be very expensive on every insert with the load we have. Best, Edvard