[ https://issues.apache.org/jira/browse/FLINK-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vasia Kalavri updated FLINK-5127: --------------------------------- Affects Version/s: 1.1.0 1.2.0 > Reduce the amount of intermediate data in vertex-centric iterations > ------------------------------------------------------------------- > > Key: FLINK-5127 > URL: https://issues.apache.org/jira/browse/FLINK-5127 > Project: Flink > Issue Type: Improvement > Components: Gelly > Affects Versions: 1.1.0, 1.2.0 > Reporter: Vasia Kalavri > Assignee: Vasia Kalavri > Fix For: 1.3.0 > > > The vertex-centric plan contains a join between the workset (messages) and > the solution set (vertices) that outputs <Vertex, Message> tuples. This > intermediate dataset is then co-grouped with the edges to provide the Pregel > interface directly. > This issue proposes an improvement to reduce the size of this intermediate > dataset. In particular, the vertex state does not have to be attached to all > the output tuples of the join. If we replace the join with a coGroup and use > an `Either` type, we can attach the vertex state to the first tuple only. The > subsequent coGroup can retrieve the vertex state from the first tuple and > correctly expose the Pregel interface. > In my preliminary experiments, I find that this change reduces intermediate > data by 2x for small vertex state and 4-5x for large vertex states. -- This message was sent by Atlassian JIRA (v6.3.15#6346)