Joining Streams: "One operator with N inputs" vs "N-1 co-processors"

Salva Alcántara Wed, 04 Dec 2024 05:03:28 -0800

I have a job which basically joins different inputs together, all
partitioned by the same key.


I originally took the typical approach and created a pipeline consisting of
N-1 successive joins, each one implemented using a DataStream co-process
function.

To avoid shuffling and also some state duplication across operators, I am
now considering the following alternative design:

- Collapse all the pipeline into a single (fat) operator
- This operator will process all the inputs, effectively

Since Flink does not support side inputs yet, they need to be simulated,
e.g., by unioning all the different inputs into a sum type (a tuple or a
POJO with one field for each type of input).

Has anyone experimented with these two (somehow dual) approaches? If so,
could you provide some guidance/advice to decide which one to use?

On a related note, are there any plans to move FLIP-17
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-17%3A+Side+Inputs+for+DataStream+API>
forward?

Regards,

Salva

Joining Streams: "One operator with N inputs" vs "N-1 co-processors"

Reply via email to