Detecting end of distributed batch processing within a streaming graph

Pierre DOR Thu, 01 Mar 2018 11:05:40 -0800

Hello all,

This question is not strictly related to Kafka but rather to a streaming design 
using Kafka. Hope it still stays within the scope of this list.


I would like to distribute the processing of a monolithic batch within a 
streaming DAG, so with multiple parallel branches, each branch being composed 
of one or several streaming micro-services doing tasks in sequence, and 
communication between each micro-service being done through Kafka.
The problem is to detect instantaneously that all subtasks have been processed 
(successfully or not) within the DAG, thing that is rather trivial in the 
monolithic design.

To solve this I’m thinking about a special “control topic”, responsible for 
carrying end processing statuses, and a specific micro-service responsible for 
handling those processing statuses and for eventually reporting proactively the 
batch end event.
Knowing that there will be process, let’s call it “the feeder”, injecting all 
batch subtasks in the primary input topic of the DAG, the idea in a nutshell 
would be:
- To have the feeder generating first a START <batchID> event in the control 
topic, so that the “controller” micro-service initializes a state for the given 
batchID.
- To have the feeder injecting in sequence all subtasks in the primary input 
topic.
- To have the feeder injecting right after a special END <batchID> event in the 
primary input topic. This END event is special because instead of being sent 
over Kafka as any other message, i.e. on one partition, this message would be 
broadcast on every topic’s partitions, and cascaded within the DAG in the same 
way so that it is sure that it will go through absolutely all partitions (and 
all consumers) of the global streaming graph.
- Each consumer micro-service receiving this END message (so possibly multiple 
times) will report to the “control topic” that this END message has been 
received, together with some IDs (batch ID and partition ID).
- The “controller” micro-service updates a counter each time it receives an END 
event for a given batch ID. As the total number of partitions in the DAG is 
relatively static, this number can be pre-configured in the controller. When 
the total number of received END events matches the total number of partitions 
that means that the batch has been processed.

What do you think about this design?
Would you have more elegant ideas to solve this problem? I don’t like very much 
the idea to broadcast messages on all partitions, nor to couple the 
configuration with the total number of partitions, even if technically it seems 
to work.

Many thanks,
- Pierre

Detecting end of distributed batch processing within a streaming graph

Reply via email to