Vasia and I are looking for additional feedback on FLINK-3772. This ticket [0] and PR [1] provides a set of graph algorithms which compute and store the degree for vertices and edges.
Degree annotation is a basic component of many algorithms. For example, PageRank requires the vertex out-degree to evenly divide the vertex score among neighbors. The triangle enumeration algorithm in Gelly requires both source and target degree for each edge as an optimization. The Jaccard and Adamic-Adar similarities require the target degree for each edge. As discussed in the ticket, Graph has methods for outDegrees, inDegrees, and getDegrees. These are simple but difficult or impossible to use efficiently. Stepping back, I believe algorithms should be composable and reused where possible, not only to improve Flink but also to support users. Implementing algorithms as classes rather than Graph methods enables customization and optimization such as used here. One such optimization is CachingGraphAlgorithm which implicitly reuses DataSets. There is overlap between algorithms. From this ticket, annotating edges with source and target degree on an undirected graph can reuse vertex degree. Local clustering coefficient requires a triangle listing and global clustering coefficient requires a triangle count, there is no need to generate the list three times. Further optimizations include the use of mutable types, reusing sort order, avoidance of coGroup, configurable parallelism, and not unnecessarily materializing DataSets. I see all this as the expectation for inclusion in Flink. Greg [0] https://issues.apache.org/jira/browse/FLINK-3772 [1] https://github.com/apache/flink/pull/1901/files