Hi

I am looking at Samza to process some incoming stream of trades. Processing 
pipeline is a complex DAG where some nodes might create zero to many descendant 
events. Ultimately they got to the end sink (and now these are completely 
different events all originated by one source trade)

I am looking for a answer to quite fundamental (in my view) question - given 
source trade, how could I know if it was fully processed by DAG or is it still 
in flight? It is called lineage tracking in Storm afaik.

In my situation when trade event is fired I need a reliable way to determine 
when processing is done in order to treat results. I googled a lot of 
algorithms - checkpointing in Flink, transactions in Storm. Checkpointing in 
Flink could work but unfortunately I can’t use them from my source to mark a 
trade with checkpoint barrier - API is not available. Looking at Samza there’s 
nothing obvious either.

I have a very uncomfortable feeling that the problem should be very basic and 
fundamental, maybe I am using wrong terms to explain it. take it to extreme - 
one can send 1 event to Samza and processing is takes 15 minutes on some node. 
How on the end of the pipeline should I know when its done? 

Thank you

Reply via email to