Sounds reasonable to me. -Matthias
On 3/22/19 9:50 AM, Tim Gent wrote: > Hi all, > > We have a data processing system where a daily batch process generates > some data into a Kafka topic. This then goes through several other > components that enrich the data, these are also integrated via Kafka. > So overall we have something like: > > Batch job -> topic A -> streaming app 2 -> topic B -> streaming app 3 > > We would like to know when all the data generated onto topic A finally > gets processed by streaming app 3, as we may trigger some other > processes from this (e.g. notifying customers their data is processed > for that day). We've come up with a possible solution, and it would be > great to get feedback to see what we missed. > > Assumptions: > - Consumers all track their offsets using Kafka, committing once > they've done all required processing for a message > - We have some "batch-monitor" component which will track progress, > described below > - It isn't important to us to know exactly when the batch finished > processing, sometime soon after batch finished processing is good > enough > > Broad flow: > - Batch job reads some input data and publishes output to topic A > - Batch job sends data to our "batch-monitor" component about the > offsets on each partition at the time it finishes it's processing > - "batch-monitor" subscribes to the topic containing the committed > offsets for topic A for streaming app 2 consumer > - "batch-monitor" can therefore see when streaming app 2 has committed > all the offsets that were in the batch > - Once "batch-monitor" detects that streaming app 2 has finished it's > processing for the batch it records max offsets for all partitions for > messages in topic b -> these can be used to know when streaming app 3 > has finished processing the batch > - "batch-monitor" subscribes to the topic containing the committed > offsets for topic B for streaming app 3 consumer > - "batch-monitor" can therefore see when streaming app 3 has committed > all the offsets that were in the batch > - Once that happens "batch-monitor" can send some notification somewhere else > > Any thoughts gratefully received > > Tim >
signature.asc
Description: OpenPGP digital signature