Hey Adrien, thank you for the elaborate explanation!
We are ingesting call data records here, which due to the nature of a telco network might not arrive in absolute logical order. If I understand your explanation correctly, you are saying that with your setup, Kafka guarantees the processing in order of ingestion of the messages. Correct? Thanks! -wim On Thu, 8 Mar 2018 at 22:58 adrien ruffie <adriennolar...@hotmail.fr> wrote: > Hello Wim, > > > does it matter (I think), because one of the big and principal features of > Kafka is: > > Kafka is to do load balancing of messages and guarantee ordering in a > distributed cluster. > > > The order of the messages should be guaranteed, unless several cases: > > 1] Producer can cause data loss when, block.on.buffer.full = false, > retries are exhausted and sending message without using acks=all > > 2] unclean leader election enable: because if one of follower (out of > sync) become the new leader, messages that were not synced to the new > > leader are lost. > > > message reordering might happen when: > > 1] max.in.flight.requests.per.connection > 1 and retries are enabled > > 2] when a producer is not correclty closed like, without calling .close() > > Because close method allowing to ensure that accumulator is closed first > to guarantee that no more appends are accepted after breaking the send loop. > > > > If you wan't to avoir these cases: > > - close producer in the callback error > > - close producer with close(0) to prevent sending after previous message > send failed > > > Avoid data loss: > > - block.on.buffer.fill=TRUE > > - retries=Long.MAX_VALUE > > - acks=all > > > Avoid reordering: > > max.in.flight.request.per.connection=1 (be aware about latency) > > > take attention about, if your producer is down, messages in buffer will > still be lost ... (perhaps manage a local storage if you are punctilious) > > moreover at least two replicas are nedded at any time to guarantee data > persistence. example replication factor = 3, min.isr = 2 , unclean leader > election disabled > > > Also keep in mind that consumer can lose message when offsets are not > correctly commited. Disable auto.offset.commit and commit offsets only > after make your job for each message (or commit several processed messages > at one time and kept in a local memory buffer) > > > I hope, these previous suggestions help you 😊 > > > Best regards, > > Adrien > > ________________________________ > De : Wim Van Leuven <wim.vanleu...@highestpoint.biz> > Envoyé : jeudi 8 mars 2018 21:35:13 > À : users@kafka.apache.org > Objet : Delayed processing > > Hello, > > I'm wondering how to design a KStreams or regular Kafka application that > can hold of processing of messages until a future time. > > This related to EU's data protection regulation: we can store raw messages > for a given time; afterwards we have to store the anonymised message. So, I > was thinking about branching the stream, anonymise the messages into a > waiting topic and than continue from there until the retention time passes. > > But that approach has some caveats: > > - This is not an exact solution as order of events is not guaranteed: we > might encounter a message that triggers the stop processing while some > events arriving later should normally still pass > - how to stop properly stop processing if we encounter a message that > indicates to not continue? > - ... > > Are there better know solutions or best practices to delay message > processing with Kafka streams / consumers+producers? > > Thanks for any insights/help here! > -wim >