Haibo Suen created FLINK-11256:
----------------------------------

             Summary: Referencing StreamNode objects directly in StreamEdge 
causes the sizes of JobGraph and TDD to become unnecessarily large
                 Key: FLINK-11256
                 URL: https://issues.apache.org/jira/browse/FLINK-11256
             Project: Flink
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.7.1, 1.7.0
            Reporter: Haibo Suen
            Assignee: Haibo Suen


When a job graph is generated from StreamGraph, StreamEdge(s) on the stream 
graph are serialized to StreamConfig and stored into the job graph. After that, 
the serialized bytes will be included in the TDD and distributed to TM. Because 
StreamEdge directly reference to StreamNode objects including sourceVertex and 
targetVertex, these objects are also written transitively on serializing 
StreamEdge. But these StreamNode objects are not needed at runtime. For a large 
size topology, this will causes JobGraph/TDD to become much larger than that 
actually need, and more likely to occur rpc timeout when transmitted.

In Streamedge, only the ID of StreamNode should be stored to avoid this 
situation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to