Hi Felix! Interesting suggestion. Here are some thoughts on the design.
The two core changes needed to send data once to the TaskManagers are: (1) Every sender needs to produce its stuff once (rather than for every target task), there should not be redundancy there. (2) Every TaskManager should request the data once, other tasks in the same TaskManager pick it up from there. The current receiver-initialted pull model is actually a good abstraction for that, I think. Lets look at (1): - Currently, the TaskManagers have a separate intermediate result partition for each target slot. They should rather have one intermediate result partition (saves also repeated serialization) that is consumed multiple times. - Since the results that are to be broadcasted are always "blocking", they can be consumed (pulled) multiples times. Lets look at (2): - The current BroadcastVariableManager has the functionality to let the first accessor of the BC-variable materialize the result. - It could be changed such that only the first accessor creates a RecordReader, so the others do not even request the stream. That way, the TaskManager should pull only one stream from each producing task, which means the data is transferred once. That would also work perfectly with the current failure / recovery model. What do you think? Stephan On Fri, Jul 22, 2016 at 2:59 PM, Felix Neutatz <neut...@googlemail.com> wrote: > Hi everybody, > > I want to improve the performance of broadcasts in Flink. Therefore Till > told me to start a FLIP on this topic to discuss how to go forward to solve > the current issues for broadcasts. > > The problem in a nutshell: Instead of sending data to each taskmanager only > once, at the moment the data is sent to each task. This means if there are > 3 slots on each taskmanager we will send the data 3 times instead of once. > > There are multiple ways to tackle this problem and I started to do some > research and investigate. You can follow my thought process here: > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+Only+send+data+to+each+taskmanager+once+for+broadcasts > > This is my first FLIP. So please correct me, if I did something wrong. > > I am interested in your thoughts about how to solve this issue. Do you > think my approach is heading into the right direction or should we follow a > totally different one. > > I am happy about any comment :) > > Best regards, > Felix >