Sounds reasonable to me.

-Matthias

On 3/22/19 9:50 AM, Tim Gent wrote:
> Hi all,
> 
> We have a data processing system where a daily batch process generates
> some data into a Kafka topic. This then goes through several other
> components that enrich the data, these are also integrated via Kafka.
> So overall we have something like:
> 
> Batch job -> topic A -> streaming app 2 -> topic B -> streaming app 3
> 
> We would like to know when all the data generated onto topic A finally
> gets processed by streaming app 3, as we may trigger some other
> processes from this (e.g. notifying customers their data is processed
> for that day). We've come up with a possible solution, and it would be
> great to get feedback to see what we missed.
> 
> Assumptions:
> - Consumers all track their offsets using Kafka, committing once
> they've done all required processing for a message
> - We have some "batch-monitor" component which will track progress,
> described below
> - It isn't important to us to know exactly when the batch finished
> processing, sometime soon after batch finished processing is good
> enough
> 
> Broad flow:
> - Batch job reads some input data and publishes output to topic A
> - Batch job sends data to our "batch-monitor" component about the
> offsets on each partition at the time it finishes it's processing
> - "batch-monitor" subscribes to the topic containing the committed
> offsets for topic A for streaming app 2 consumer
> - "batch-monitor" can therefore see when streaming app 2 has committed
> all the offsets that were in the batch
> - Once "batch-monitor" detects that streaming app 2 has finished it's
> processing for the batch it records max offsets for all partitions for
> messages in topic b -> these can be used to know when streaming app 3
> has finished processing the batch
> - "batch-monitor" subscribes to the topic containing the committed
> offsets for topic B for streaming app 3 consumer
> - "batch-monitor" can therefore see when streaming app 3 has committed
> all the offsets that were in the batch
> - Once that happens "batch-monitor" can send some notification somewhere else
> 
> Any thoughts gratefully received
> 
> Tim
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to