Hello Wim,
does it matter (I think), because one of the big and principal features of Kafka is: Kafka is to do load balancing of messages and guarantee ordering in a distributed cluster. The order of the messages should be guaranteed, unless several cases: 1] Producer can cause data loss when, block.on.buffer.full = false, retries are exhausted and sending message without using acks=all 2] unclean leader election enable: because if one of follower (out of sync) become the new leader, messages that were not synced to the new leader are lost. message reordering might happen when: 1] max.in.flight.requests.per.connection > 1 and retries are enabled 2] when a producer is not correclty closed like, without calling .close() Because close method allowing to ensure that accumulator is closed first to guarantee that no more appends are accepted after breaking the send loop. If you wan't to avoir these cases: - close producer in the callback error - close producer with close(0) to prevent sending after previous message send failed Avoid data loss: - block.on.buffer.fill=TRUE - retries=Long.MAX_VALUE - acks=all Avoid reordering: max.in.flight.request.per.connection=1 (be aware about latency) take attention about, if your producer is down, messages in buffer will still be lost ... (perhaps manage a local storage if you are punctilious) moreover at least two replicas are nedded at any time to guarantee data persistence. example replication factor = 3, min.isr = 2 , unclean leader election disabled Also keep in mind that consumer can lose message when offsets are not correctly commited. Disable auto.offset.commit and commit offsets only after make your job for each message (or commit several processed messages at one time and kept in a local memory buffer) I hope, these previous suggestions help you 😊 Best regards, Adrien ________________________________ De : Wim Van Leuven <wim.vanleu...@highestpoint.biz> Envoyé : jeudi 8 mars 2018 21:35:13 À : users@kafka.apache.org Objet : Delayed processing Hello, I'm wondering how to design a KStreams or regular Kafka application that can hold of processing of messages until a future time. This related to EU's data protection regulation: we can store raw messages for a given time; afterwards we have to store the anonymised message. So, I was thinking about branching the stream, anonymise the messages into a waiting topic and than continue from there until the retention time passes. But that approach has some caveats: - This is not an exact solution as order of events is not guaranteed: we might encounter a message that triggers the stop processing while some events arriving later should normally still pass - how to stop properly stop processing if we encounter a message that indicates to not continue? - ... Are there better know solutions or best practices to delay message processing with Kafka streams / consumers+producers? Thanks for any insights/help here! -wim