Hi,

Our data processing pipeline consists of a set of Samza jobs, that form a
DAG. Sometimes, we have to throw finite datasets into the Kafka topic that
acts as the entry point to the pipeline. Given that different Samza jobs in
the DAG could have varying latencies in terms of processing the records (or
could even temporarily fails or be stuck), how do I detect that my assembly
of jobs have finished processing all records? It's not as simple as
tallying the input and output record counts, as some jobs could be
filtering data, and others could be grouping records etc.

Thanks,

Kishore.

Reply via email to