I have a job which basically joins different inputs together, all partitioned by the same key.
I originally took the typical approach and created a pipeline consisting of N-1 successive joins, each one implemented using a DataStream co-process function. To avoid shuffling and also some state duplication across operators, I am now considering the following alternative design: - Collapse all the pipeline into a single (fat) operator - This operator will process all the inputs, effectively Since Flink does not support side inputs yet, they need to be simulated, e.g., by unioning all the different inputs into a sum type (a tuple or a POJO with one field for each type of input). Has anyone experimented with these two (somehow dual) approaches? If so, could you provide some guidance/advice to decide which one to use? On a related note, are there any plans to move FLIP-17 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-17%3A+Side+Inputs+for+DataStream+API> forward? Regards, Salva