Hi,
we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager. We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with scaling the number of slots per node from 1 - 16. +-----------------------+--------------+-------------+ | suite | name | median_time | +=======================+==============+=============+ | broadcast.cloud-11 | broadcast.01 | 8796 | | broadcast.cloud-11 | broadcast.02 | 14802 | | broadcast.cloud-11 | broadcast.04 | 30173 | | broadcast.cloud-11 | broadcast.08 | 56936 | | broadcast.cloud-11 | broadcast.16 | 117507 | | broadcast.ibm-power-1 | broadcast.01 | 6807 | | broadcast.ibm-power-1 | broadcast.02 | 8443 | | broadcast.ibm-power-1 | broadcast.04 | 11823 | | broadcast.ibm-power-1 | broadcast.08 | 21655 | | broadcast.ibm-power-1 | broadcast.16 | 37426 | +-----------------------+--------------+-------------+ After looking into the code base it, it seems that the data is de-serialized only once per TM, but the actual data is sent for all slots running the operator with broadcast vars and just gets discarded in case its already de-serialized. I do not see a reason the data can't be shared among the slots of a TM and therefore just sent once, but I guess it would require quite some changes bc sets are handled currently. Are there any future plans regarding this and/or is there interest in this "feature"? Best Andreas? [1] https://github.com/TU-Berlin-DIMA/flink-broadcast?