Hi Fabian, I don't know if this has been looked at. There is discussion of BipartiteGraph in FLINK-2254. If Gelly had DirectedGraph and UndirectedGraph then the API could stay lean while methods could be tuned for the specific graph type. I do like having simple APIs such as DataSet and Graph for for interactive use in small scripts or Scala and Python shells.
Greg On Mon, Apr 25, 2016 at 8:46 AM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Greg and Vasia, > > thanks for starting this discussion. > I think it is a good idea to update the Gelly roadmap. As Vasia said, many > items on the list have been implemented and other have been more or less > dropped. > Also new persons who want to improve Gelly have joint the community while > others have become idle. > So from my point of view it makes perfect sense to gather the input of the > community and update the Gelly roadmap. > > Regarding the specific changes of FLINK-3772, I am usually always in favor > of performance improvements. > How much would the suggested functionality for degree computation overlaps > with the current methods? > Could the current methods be replaced by the more efficient implementations > or would there be two methods which look very similar and behave almost the > same? > > Best, Fabian > > > > 2016-04-22 11:00 GMT+02:00 Vasiliki Kalavri <vasilikikala...@gmail.com>: > > > Hi all, > > > > I asked Greg to start a discussion here about FLINK-3771 and FLINK-3772, > to > > make sure we're all on the same page about future Gelly development. > > > > About a year ago we created the Gelly roadmap [1]. Many of these items > have > > been implemented and others were researched and either developed > externally > > or dropped. Graph translators and degree annotations were not in the > > roadmap, but I personally think that they are both helpful features and > I'm > > in favor of adding them to Gelly. > > > > That said, I find this a perfect timing for revisiting and updating our > > Gelly development plans. We should update the roadmap so that it reflects > > current state and future plans. This way, the community can have a clear > > picture of what we are working on and what are upcoming features. This is > > also very important for new contributors. They can always refer to the > > roadmap to see what the community is interested in and we can avoid > having > > people spending their time on obsolete or dropped features. We should > also > > go through existing JIRAs and clean them up. Several are assigned to > people > > who have been inactive or might need help. > > > > In the following days I will go through the roadmap and then start a new > > thread where we can discuss features and agree on future plans. In the > > meantime, it would be great if you give us your view on FLINK-3771 and > > FLINK-3772. > > > > Cheers, > > -Vasia. > > > > [1]: https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly > > > > On 21 April 2016 at 18:52, Greg Hogan <c...@greghogan.com> wrote: > > > > > Vasia and I are looking for additional feedback on FLINK-3772. This > > ticket > > > [0] and PR [1] provides a set of graph algorithms which compute and > store > > > the degree for vertices and edges. > > > > > > Degree annotation is a basic component of many algorithms. For example, > > > PageRank requires the vertex out-degree to evenly divide the vertex > score > > > among neighbors. The triangle enumeration algorithm in Gelly requires > > both > > > source and target degree for each edge as an optimization. The Jaccard > > and > > > Adamic-Adar similarities require the target degree for each edge. > > > > > > As discussed in the ticket, Graph has methods for outDegrees, > inDegrees, > > > and getDegrees. These are simple but difficult or impossible to use > > > efficiently. > > > > > > Stepping back, I believe algorithms should be composable and reused > where > > > possible, not only to improve Flink but also to support users. > > Implementing > > > algorithms as classes rather than Graph methods enables customization > and > > > optimization such as used here. > > > > > > One such optimization is CachingGraphAlgorithm which implicitly reuses > > > DataSets. There is overlap between algorithms. From this ticket, > > annotating > > > edges with source and target degree on an undirected graph can reuse > > vertex > > > degree. Local clustering coefficient requires a triangle listing and > > global > > > clustering coefficient requires a triangle count, there is no need to > > > generate the list three times. > > > > > > Further optimizations include the use of mutable types, reusing sort > > order, > > > avoidance of coGroup, configurable parallelism, and not unnecessarily > > > materializing DataSets. I see all this as the expectation for inclusion > > in > > > Flink. > > > > > > Greg > > > > > > [0] https://issues.apache.org/jira/browse/FLINK-3772 > > > [1] https://github.com/apache/flink/pull/1901/files > > > > > >