In FLINK-2254 <https://issues.apache.org/jira/browse/FLINK-2254> , we want to extend Graph API of Gelly by adding support for bipartite graphs too. In the long term, the support for newer graph types can be added in the same way. Also specialised algorithms for special types of graphs should be implemented.
For bipartite graph, a new class BipartiteGraph can be implemented which extends Graph. Other graph algorithms which are written for Graph can be used for BipartiteGraph too, because of inheritance. Initialisation functions like public static <K, VV, EV> Graph<K, VV, EV> fromCollection(Collection<Vertex<K, VV>> vertices,Collection<Edge<K, EV>> edges, ExecutionEnvironment context) need to be duplicated as public static <K, VV, EV> BipartiteGraph<K, VV, EV> fromCollection(Collection<Vertex<K, VV>> topVertices,Collection<Vertex<K, VV>> bottomVertices, Collection<Edge<K, EV>> edges, ExecutionEnvironment context) Graph validation does not need to happen implicitly. user can call BipartiteGraph.validate() in case he wants to check sanity of data. Vertex modes is primary requirement. BipartiteGraph should have functions, getTopVertices() and getBottomVertices(). There are three ways to implement it: Simplest and least efficient way is to maintain vertices variable in Graph in addition to two more Datasets topVertices, bottomVertices. Benefit of this approach is that Graph class would not need any modification at all. this.vertices variable access inside Graph class would work correctly. Disadvantage is that multiple copies of vertex dataset are created and maintained. So this is inefficient in terms of memory. Another way is to keep topVertices and bottomVertices variables in BipartiteGraph. vertices variable in Graph would stay empty. getVertices() function of Graph would be overloaded by BipartiteGraph and reimplemented as union of of topVertices and bottomVertices. Since, vertices is a private variable, external view of vertices stays unaffected. All functions of Graph class which use vertices local variable (e.g. getVerticesAsTuple2()) need to use getVertices() instead of this.vertices. Disadvantage of this method is Graph.vertices variable would have invalid value throughout for BipartiteGraph. Another way is to ‘label’ vertices with an integer. So public class BipartiteGraph<K,VV,EV> extends Graph<K, Tuple2<VV,Integer>, EV> would be the signature. Initialisers would tag the vertices according to their mode. getVertices() should be overridden to strip-off the tag values so that users would get consistent view of vertex dataset for Graph and BipartiteGraph. getTopVertices/getBottomVertices would be filter functions on vertex tags. I personally feel method 2 to be most elegant. Functions like addVertices(vertexList) are not relevant without specifying whether they are top or bottom. A possibility could be to add them to topVertices by default. And have overloaded function addVertices(vertexList, mode) to add them to specific partition. Specialised algorithms for Bipartite graphs can implement new BipartiteGraphAlgorithm interface. It would be similar to GraphAlgorithm. In future, newer types of graphs can be similarly derived from Graph and type hierarchy of Graph can be created. Class hierarchy would be better here rather than interface hierarchy, because the functions which are applicable to all derived classes can be implemented in the base class. As a part of future refactoring, Graph transformation functions should rather be a new class implementing GraphAlgorithm rather than function in Graph class. This may make implementing complex graph types easier. PS. Do I need to add it in wiki too? I can copy the contents to wiki if you say so, -- Saumitra Shahapure