[jira] [Created] (FLINK-4175) Broadcast data sent increases with # slots per TM

Felix Neutatz (JIRA) Thu, 07 Jul 2016 16:43:15 -0700

Felix Neutatz created FLINK-4175:
------------------------------------

             Summary: Broadcast data sent increases with # slots per TM
                 Key: FLINK-4175
                 URL: https://issues.apache.org/jira/browse/FLINK-4175
             Project: Flink
          Issue Type: Improvement
          Components: Core, TaskManager
    Affects Versions: 1.0.3
            Reporter: Felix Neutatz
            Assignee: Felix Neutatz



Problem:
we experience some unexpected increase of data sent over the network for 
broadcasts with increasing number of slots per Taskmanager.


We provided a benchmark [1]. It not only increases the size of data sent over 
the network but also hurts performance as seen in the preliminary results 
below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with 
scaling the number of slots per node from 1 - 16.


+-----------------------+--------------+-------------+
| suite                 | name         | median_time |
+=======================+==============+=============+
| broadcast.cloud-11    | broadcast.01 |        8796 |
| broadcast.cloud-11    | broadcast.02 |       14802 |
| broadcast.cloud-11    | broadcast.04 |       30173 |
| broadcast.cloud-11    | broadcast.08 |       56936 |
| broadcast.cloud-11    | broadcast.16 |      117507 |
| broadcast.ibm-power-1 | broadcast.01 |        6807 |
| broadcast.ibm-power-1 | broadcast.02 |        8443 |
| broadcast.ibm-power-1 | broadcast.04 |       11823 |
| broadcast.ibm-power-1 | broadcast.08 |       21655 |
| broadcast.ibm-power-1 | broadcast.16 |       37426 |
+-----------------------+--------------+-------------+



After looking into the code base it, it seems that the data is de-serialized 
only once per TM, but the actual data is sent for all slots running the 
operator with broadcast vars and just gets discarded in case its already 
de-serialized.


We do not see a reason the data can't be shared among the slots of a TM and 
therefore just sent once.

[1] https://github.com/TU-Berlin-DIMA/flink-broadcast

This Jira will continue the discussion started here: 
https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%[email protected]%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-4175) Broadcast data sent increases with # slots per TM

Reply via email to