Hi, Our data processing pipeline consists of a set of Samza jobs, that form a DAG. Sometimes, we have to throw finite datasets into the Kafka topic that acts as the entry point to the pipeline. Given that different Samza jobs in the DAG could have varying latencies in terms of processing the records (or could even temporarily fails or be stuck), how do I detect that my assembly of jobs have finished processing all records? It's not as simple as tallying the input and output record counts, as some jobs could be filtering data, and others could be grouping records etc.
Thanks, Kishore.