Hello Wim,

does it matter (I think), because one of the big and principal features of 
Kafka is:

Kafka is to do load balancing of messages and guarantee ordering in a 
distributed cluster.


The order of the messages should be guaranteed, unless several cases:

1] Producer can cause data loss when, block.on.buffer.full = false, retries are 
exhausted and sending message without using acks=all

2] unclean leader election enable: because if one of follower (out of sync) 
become the new leader, messages that were not synced to the new

leader are lost.


message reordering might happen when:

1] max.in.flight.requests.per.connection > 1 and retries are enabled

2] when a producer is not correclty closed like, without calling .close()

Because close method allowing to ensure that accumulator is closed first to 
guarantee that no more appends are accepted after breaking the send loop.



If you wan't to avoir these cases:

- close producer in the callback error

- close producer with close(0) to prevent sending after previous message send 
failed


Avoid data loss:

- block.on.buffer.fill=TRUE

- retries=Long.MAX_VALUE

- acks=all


Avoid reordering:

max.in.flight.request.per.connection=1 (be aware about latency)


take attention about, if your producer is down, messages in buffer will still 
be lost ... (perhaps manage a local storage if you are punctilious)

moreover at least two replicas are nedded at any time to guarantee data 
persistence. example replication factor = 3, min.isr = 2 , unclean leader 
election disabled


Also keep in mind that consumer can lose message when offsets are not correctly 
commited. Disable auto.offset.commit and commit offsets only after make your 
job for each message (or commit several processed messages at one time and kept 
in a local memory buffer)


I hope, these previous suggestions help you 😊


Best regards,

Adrien

________________________________
De : Wim Van Leuven <wim.vanleu...@highestpoint.biz>
Envoyé : jeudi 8 mars 2018 21:35:13
À : users@kafka.apache.org
Objet : Delayed processing

Hello,

I'm wondering how to design a KStreams or regular Kafka application that
can hold of processing of messages until a future time.

This related to EU's data protection regulation: we can store raw messages
for a given time; afterwards we have to store the anonymised message. So, I
was thinking about branching the stream, anonymise the messages into a
waiting topic and than continue from there until the retention time passes.

But that approach has some caveats:

   - This is not an exact solution as order of events is not guaranteed: we
   might encounter a message that triggers the stop processing while some
   events arriving later should normally still pass
   - how to stop properly stop processing if we encounter a message that
   indicates to not continue?
   - ...

Are there better know solutions or best practices to delay message
processing with Kafka streams / consumers+producers?

Thanks for any insights/help here!
-wim

Reply via email to