Re: Joining Streams: "One operator with N inputs" vs "N-1 co-processors"

Yaroslav Tkachenko Wed, 04 Dec 2024 09:17:14 -0800

Hi Salva,

I've done exactly that (union for N number of streams in order to perform a
join), and gave a talk at Flink Forward a few years ago:
https://www.youtube.com/watch?v=tiGxEGPyqCg&ab_channel=FlinkForward


On Wed, Dec 4, 2024 at 5:03 AM Salva Alcántara <salcantara...@gmail.com>
wrote:

> I have a job which basically joins different inputs together, all
> partitioned by the same key.
>
> I originally took the typical approach and created a pipeline consisting
> of N-1 successive joins, each one implemented using a DataStream co-process
> function.
>
> To avoid shuffling and also some state duplication across operators, I am
> now considering the following alternative design:
>
> - Collapse all the pipeline into a single (fat) operator
> - This operator will process all the inputs, effectively
>
> Since Flink does not support side inputs yet, they need to be simulated,
> e.g., by unioning all the different inputs into a sum type (a tuple or a
> POJO with one field for each type of input).
>
> Has anyone experimented with these two (somehow dual) approaches? If so,
> could you provide some guidance/advice to decide which one to use?
>
> On a related note, are there any plans to move FLIP-17
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-17%3A+Side+Inputs+for+DataStream+API>
> forward?
>
> Regards,
>
> Salva
>

Re: Joining Streams: "One operator with N inputs" vs "N-1 co-processors"

Reply via email to