Hi Saumitra, could you maybe add your ideas above in a google doc and share the link, so we can easily comment and/or edit?
Thank you! -Vasia. On 22 October 2015 at 10:53, Andra Lungu <lungu.an...@gmail.com> wrote: > Hi Saumitra, > > As you already noticed, the first version (with duplicates) is highly > inefficient and consumes a lot of memory. So, I suggest we drop it for now. > The version with the label makes a lot of modifications on the base Graph > class, and this, in my opinion would make it more difficult to grasp. > Bipartite Graphs are not as common as regular graphs. And then, when you > would add an Isomorphic Graph class (stupid example), you'll need to strip > stuff out again. > > I believe we should go with a BipartiteGraph class which extends Graph and > which has a DataSet of topVertices and a DataSet of bottomVertices. Apart > from getTopVertices() methods, we would still need functions such as > getNumberOfTopVertices(). For the time being, just support the funstiones > mentiones as well as the basics: fromCollection, fromDataSet, etc, get*, > join*, map*, filter*. > > Wait for a few more opinions and dig in. I believe we should discuss this > even further and propose Jiras for examples of algos for bipartite graphs. > > Once we have the design specs well figured out, and if I find some time > I'll gladly chime in :) > > Cheers! > Andra > > On Wed, Oct 21, 2015 at 8:45 PM, Saumitra Shahapure < > saumitra.offic...@gmail.com> wrote: > > > In FLINK-2254 <https://issues.apache.org/jira/browse/FLINK-2254> , we > > want to extend Graph API of Gelly by adding support for bipartite graphs > > too. In the long term, the support for newer graph types can be added in > > the same way. Also specialised algorithms for special types of graphs > > should be implemented. > > > > For bipartite graph, a new class BipartiteGraph can be implemented which > > extends Graph. Other graph algorithms which are written for Graph can be > > used for BipartiteGraph too, because of inheritance. > > Initialisation functions like public static <K, VV, EV> Graph<K, VV, EV> > > fromCollection(Collection<Vertex<K, VV>> vertices,Collection<Edge<K, EV>> > > edges, ExecutionEnvironment context) need to be duplicated as public > static > > <K, VV, EV> BipartiteGraph<K, VV, EV> fromCollection(Collection<Vertex<K, > > VV>> topVertices,Collection<Vertex<K, VV>> bottomVertices, > > Collection<Edge<K, EV>> edges, ExecutionEnvironment context) > > Graph validation does not need to happen implicitly. user can call > > BipartiteGraph.validate() in case he wants to check sanity of data. > > Vertex modes is primary requirement. BipartiteGraph should have > functions, > > getTopVertices() and getBottomVertices(). There are three ways to > implement > > it: > > Simplest and least efficient way is to maintain vertices variable in > Graph > > in addition to two more Datasets topVertices, bottomVertices. Benefit of > > this approach is that Graph class would not need any modification at all. > > this.vertices variable access inside Graph class would work correctly. > > Disadvantage is that multiple copies of vertex dataset are created and > > maintained. So this is inefficient in terms of memory. > > Another way is to keep topVertices and bottomVertices variables in > > BipartiteGraph. vertices variable in Graph would stay empty. > getVertices() > > function of Graph would be overloaded by BipartiteGraph and reimplemented > > as union of of topVertices and bottomVertices. Since, vertices is a > private > > variable, external view of vertices stays unaffected. All functions of > > Graph class which use vertices local variable (e.g. > getVerticesAsTuple2()) > > need to use getVertices() instead of this.vertices. Disadvantage of this > > method is Graph.vertices variable would have invalid value throughout for > > BipartiteGraph. > > Another way is to ‘label’ vertices with an integer. So public class > > BipartiteGraph<K,VV,EV> extends Graph<K, Tuple2<VV,Integer>, EV> would be > > the signature. Initialisers would tag the vertices according to their > mode. > > getVertices() should be overridden to strip-off the tag values so that > > users would get consistent view of vertex dataset for Graph and > > BipartiteGraph. getTopVertices/getBottomVertices would be filter > functions > > on vertex tags. > > I personally feel method 2 to be most elegant. > > Functions like addVertices(vertexList) are not relevant without > specifying > > whether they are top or bottom. A possibility could be to add them to > > topVertices by default. And have overloaded function > > addVertices(vertexList, mode) to add them to specific partition. > > Specialised algorithms for Bipartite graphs can implement new > > BipartiteGraphAlgorithm interface. It would be similar to GraphAlgorithm. > > In future, newer types of graphs can be similarly derived from Graph and > > type hierarchy of Graph can be created. Class hierarchy would be better > > here rather than interface hierarchy, because the functions which are > > applicable to all derived classes can be implemented in the base class. > > As a part of future refactoring, Graph transformation functions should > > rather be a new class implementing GraphAlgorithm rather than function in > > Graph class. This may make implementing complex graph types easier. > > PS. Do I need to add it in wiki too? I can copy the contents to wiki if > > you say so, > > -- Saumitra Shahapure > > > > >