Re: Delayed processing

Guozhang Wang Fri, 09 Mar 2018 10:46:07 -0800

Hello Wim,

Thanks for your explanations, it makes more sense to me know. I think your
scenario may be better described as a "re-processing" case than a "delayed
processing" case, since you are effectively processing the un-anonymised
data once, sending results to the topic; and then later you will
re-processing the same but anonymised data again, and sending results to
the same topic.


In Kafka Streams, late arrival records can be correctly handled since the
library support timestamp-based process ordering, and in practice you can
specify some "windowing" operations to wait for the late arrival records to
apply them. You can read more about that on some existing docs below.
Please let me know if you have further questions:

https://kafka.apache.org/0110/documentation/streams/core-concepts#streams_time

https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-joins


Guozhang


On Thu, Mar 8, 2018 at 11:07 PM, Wim Van Leuven <
wim.vanleu...@highestpoint.biz> wrote:

> Hello Guozhang,
>
> we ingest messages that we outgest is user facing datastore, after some
> additional processing. Due to GDPR, we can only retain that unanonymised
> data for a maximum period of time. Let's say 6 months.
>
> So, right before sending the data to the out topic, we'll branch the data
> into an anonymisation leg of the processing chain. Now, we want to keep the
> messages in that chain until the expiration of the unanonymised messages.
> At that point we want to send the anonymous record to the outgoing topic so
> that the anonymous message replaces the unanonymised version.
>
> Does that make the problem/idea more clear?
> --
> wim
>
>
>
> On Fri, 9 Mar 2018 at 00:47 Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hello Wim,
> >
> > Not sure if I understand your motivations for delayed processing, could
> you
> > elaborate a bit more? Do you want to process raw messages, or do you want
> > to process anonymised messages?
> >
> >
> > Guozhang
> >
> >
> > On Thu, Mar 8, 2018 at 12:35 PM, Wim Van Leuven <
> > wim.vanleu...@highestpoint.biz> wrote:
> >
> > > Hello,
> > >
> > > I'm wondering how to design a KStreams or regular Kafka application
> that
> > > can hold of processing of messages until a future time.
> > >
> > > This related to EU's data protection regulation: we can store raw
> > messages
> > > for a given time; afterwards we have to store the anonymised message.
> > So, I
> > > was thinking about branching the stream, anonymise the messages into a
> > > waiting topic and than continue from there until the retention time
> > passes.
> > >
> > > But that approach has some caveats:
> > >
> > >    - This is not an exact solution as order of events is not
> guaranteed:
> > we
> > >    might encounter a message that triggers the stop processing while
> some
> > >    events arriving later should normally still pass
> > >    - how to stop properly stop processing if we encounter a message
> that
> > >    indicates to not continue?
> > >    - ...
> > >
> > > Are there better know solutions or best practices to delay message
> > > processing with Kafka streams / consumers+producers?
> > >
> > > Thanks for any insights/help here!
> > > -wim
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: Delayed processing

Reply via email to