Hi Salva, I've done exactly that (union for N number of streams in order to perform a join), and gave a talk at Flink Forward a few years ago: https://www.youtube.com/watch?v=tiGxEGPyqCg&ab_channel=FlinkForward
On Wed, Dec 4, 2024 at 5:03 AM Salva Alcántara <salcantara...@gmail.com> wrote: > I have a job which basically joins different inputs together, all > partitioned by the same key. > > I originally took the typical approach and created a pipeline consisting > of N-1 successive joins, each one implemented using a DataStream co-process > function. > > To avoid shuffling and also some state duplication across operators, I am > now considering the following alternative design: > > - Collapse all the pipeline into a single (fat) operator > - This operator will process all the inputs, effectively > > Since Flink does not support side inputs yet, they need to be simulated, > e.g., by unioning all the different inputs into a sum type (a tuple or a > POJO with one field for each type of input). > > Has anyone experimented with these two (somehow dual) approaches? If so, > could you provide some guidance/advice to decide which one to use? > > On a related note, are there any plans to move FLIP-17 > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-17%3A+Side+Inputs+for+DataStream+API> > forward? > > Regards, > > Salva >