[ https://issues.apache.org/jira/browse/FLINK-11256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chesnay Schepler updated FLINK-11256: ------------------------------------- Issue Type: Improvement (was: Bug) > Referencing StreamNode objects directly in StreamEdge causes the sizes of > JobGraph and TDD to become unnecessarily large > ------------------------------------------------------------------------------------------------------------------------ > > Key: FLINK-11256 > URL: https://issues.apache.org/jira/browse/FLINK-11256 > Project: Flink > Issue Type: Improvement > Affects Versions: 1.7.0, 1.7.1 > Reporter: Haibo Sun > Assignee: Haibo Sun > Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When a job graph is generated from StreamGraph, StreamEdge(s) on the stream > graph are serialized to StreamConfig and stored into the job graph. After > that, the serialized bytes will be included in the TDD and distributed to TM. > Because StreamEdge directly reference to StreamNode objects including > sourceVertex and targetVertex, these objects are also written transitively on > serializing StreamEdge. But these StreamNode objects are not needed in JM and > Task. For a large size topology, this will causes JobGraph/TDD to become much > larger than that actually need, and more likely to occur rpc timeout when > transmitted. > In StreamEdge, only the ID of StreamNode should be stored to avoid this > situation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)