Hi Wim,
this topic (processing order) has been cropping up for a while, several 
article, benchmark, and other think on the subject
reaching this conclusion.

After that you can, you can ask someone else another opinion on the subject.

regards,

Adrien

________________________________
De : Wim Van Leuven <wim.vanleu...@highestpoint.biz>
Envoyé : vendredi 9 mars 2018 08:03:15
À : users@kafka.apache.org
Objet : Re: Delayed processing

Hey Adrien,

thank you for the elaborate explanation!

We are ingesting call data records here, which due to the nature of a telco
network might not arrive in absolute logical order.

If I understand your explanation correctly, you are saying that with your
setup, Kafka guarantees the processing in order of ingestion of the
messages. Correct?

Thanks!
-wim

On Thu, 8 Mar 2018 at 22:58 adrien ruffie <adriennolar...@hotmail.fr> wrote:

> Hello Wim,
>
>
> does it matter (I think), because one of the big and principal features of
> Kafka is:
>
> Kafka is to do load balancing of messages and guarantee ordering in a
> distributed cluster.
>
>
> The order of the messages should be guaranteed, unless several cases:
>
> 1] Producer can cause data loss when, block.on.buffer.full = false,
> retries are exhausted and sending message without using acks=all
>
> 2] unclean leader election enable: because if one of follower (out of
> sync) become the new leader, messages that were not synced to the new
>
> leader are lost.
>
>
> message reordering might happen when:
>
> 1] max.in.flight.requests.per.connection > 1 and retries are enabled
>
> 2] when a producer is not correclty closed like, without calling .close()
>
> Because close method allowing to ensure that accumulator is closed first
> to guarantee that no more appends are accepted after breaking the send loop.
>
>
>
> If you wan't to avoir these cases:
>
> - close producer in the callback error
>
> - close producer with close(0) to prevent sending after previous message
> send failed
>
>
> Avoid data loss:
>
> - block.on.buffer.fill=TRUE
>
> - retries=Long.MAX_VALUE
>
> - acks=all
>
>
> Avoid reordering:
>
> max.in.flight.request.per.connection=1 (be aware about latency)
>
>
> take attention about, if your producer is down, messages in buffer will
> still be lost ... (perhaps manage a local storage if you are punctilious)
>
> moreover at least two replicas are nedded at any time to guarantee data
> persistence. example replication factor = 3, min.isr = 2 , unclean leader
> election disabled
>
>
> Also keep in mind that consumer can lose message when offsets are not
> correctly commited. Disable auto.offset.commit and commit offsets only
> after make your job for each message (or commit several processed messages
> at one time and kept in a local memory buffer)
>
>
> I hope, these previous suggestions help you 😊
>
>
> Best regards,
>
> Adrien
>
> ________________________________
> De : Wim Van Leuven <wim.vanleu...@highestpoint.biz>
> Envoyé : jeudi 8 mars 2018 21:35:13
> À : users@kafka.apache.org
> Objet : Delayed processing
>
> Hello,
>
> I'm wondering how to design a KStreams or regular Kafka application that
> can hold of processing of messages until a future time.
>
> This related to EU's data protection regulation: we can store raw messages
> for a given time; afterwards we have to store the anonymised message. So, I
> was thinking about branching the stream, anonymise the messages into a
> waiting topic and than continue from there until the retention time passes.
>
> But that approach has some caveats:
>
>    - This is not an exact solution as order of events is not guaranteed: we
>    might encounter a message that triggers the stop processing while some
>    events arriving later should normally still pass
>    - how to stop properly stop processing if we encounter a message that
>    indicates to not continue?
>    - ...
>
> Are there better know solutions or best practices to delay message
> processing with Kafka streams / consumers+producers?
>
> Thanks for any insights/help here!
> -wim
>

Reply via email to