[ https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751890#comment-15751890 ]
Vasia Kalavri commented on FLINK-5127: -------------------------------------- It'd be nice to have for 1.2, but I don't know when I'll have time to work on it. I'm hoping this weekend. > Reduce the amount of intermediate data in vertex-centric iterations > ------------------------------------------------------------------- > > Key: FLINK-5127 > URL: https://issues.apache.org/jira/browse/FLINK-5127 > Project: Flink > Issue Type: Improvement > Components: Gelly > Reporter: Vasia Kalavri > Assignee: Vasia Kalavri > > The vertex-centric plan contains a join between the workset (messages) and > the solution set (vertices) that outputs <Vertex, Message> tuples. This > intermediate dataset is then co-grouped with the edges to provide the Pregel > interface directly. > This issue proposes an improvement to reduce the size of this intermediate > dataset. In particular, the vertex state does not have to be attached to all > the output tuples of the join. If we replace the join with a coGroup and use > an `Either` type, we can attach the vertex state to the first tuple only. The > subsequent coGroup can retrieve the vertex state from the first tuple and > correctly expose the Pregel interface. > In my preliminary experiments, I find that this change reduces intermediate > data by 2x for small vertex state and 4-5x for large vertex states. -- This message was sent by Atlassian JIRA (v6.3.4#6332)