Re: Design document for FLINK-2254

Saumitra Shahapure Thu, 22 Oct 2015 04:02:11 -0700

Hi Vasia,

Here
<https://docs.google.com/document/d/1SChonW4bXNiqU2Pz9PPjuKDuKrBGmESrPcJXamj4dlI/edit?usp=sharing>
is the Google doc.


-- Saumitra S. Shahapure

On Thu, Oct 22, 2015 at 11:10 AM, Vasiliki Kalavri <
vasilikikala...@gmail.com> wrote:

> Hi Saumitra,
>
> could you maybe add your ideas above in a google doc and share the link, so
> we can easily comment and/or edit?
>
> Thank you!
> -Vasia.
>
> On 22 October 2015 at 10:53, Andra Lungu <lungu.an...@gmail.com> wrote:
>
> > Hi Saumitra,
> >
> > As you already noticed, the first version (with duplicates) is highly
> > inefficient and consumes a lot of memory. So, I suggest we drop it for
> now.
> > The version with the label makes a lot of modifications on the base Graph
> > class, and this, in my opinion would make it more difficult to grasp.
> > Bipartite Graphs are not as common as regular graphs. And then, when you
> > would add an Isomorphic Graph class (stupid example), you'll need to
> strip
> > stuff out again.
> >
> > I believe we should go with a BipartiteGraph class which extends Graph
> and
> > which has a DataSet of topVertices and a DataSet of bottomVertices. Apart
> > from getTopVertices() methods, we would still need functions such as
> > getNumberOfTopVertices(). For the time being, just support the funstiones
> > mentiones as well as the basics: fromCollection, fromDataSet, etc, get*,
> > join*, map*, filter*.
> >
> > Wait for a few more opinions and dig in. I believe we should discuss this
> > even further and propose Jiras for examples of algos for bipartite
> graphs.
> >
> > Once we have the design specs well figured out, and if I find some time
> > I'll gladly chime in :)
> >
> > Cheers!
> > Andra
> >
> > On Wed, Oct 21, 2015 at 8:45 PM, Saumitra Shahapure <
> > saumitra.offic...@gmail.com> wrote:
> >
> > > In FLINK-2254 <https://issues.apache.org/jira/browse/FLINK-2254> , we
> > > want to extend Graph API of Gelly by adding support for bipartite
> graphs
> > > too. In the long term, the support for newer graph types can be added
> in
> > > the same way. Also specialised algorithms for special types of graphs
> > > should be implemented.
> > >
> > > For bipartite graph, a new class BipartiteGraph can be implemented
> which
> > > extends Graph. Other graph algorithms which are written for Graph can
> be
> > > used for BipartiteGraph too, because of inheritance.
> > > Initialisation functions like  public static <K, VV, EV> Graph<K, VV,
> EV>
> > > fromCollection(Collection<Vertex<K, VV>> vertices,Collection<Edge<K,
> EV>>
> > > edges, ExecutionEnvironment context) need to be duplicated as public
> > static
> > > <K, VV, EV> BipartiteGraph<K, VV, EV>
> fromCollection(Collection<Vertex<K,
> > > VV>> topVertices,Collection<Vertex<K, VV>> bottomVertices,
> > > Collection<Edge<K, EV>> edges, ExecutionEnvironment context)
> > > Graph validation does not need to happen implicitly. user can call
> > > BipartiteGraph.validate() in case he wants to check sanity of data.
> > > Vertex modes is primary requirement. BipartiteGraph should have
> > functions,
> > > getTopVertices() and getBottomVertices(). There are three ways to
> > implement
> > > it:
> > > Simplest and least efficient way is to maintain vertices variable in
> > Graph
> > > in addition to two more Datasets topVertices, bottomVertices. Benefit
> of
> > > this approach is that Graph class would not need any modification at
> all.
> > > this.vertices variable access inside Graph class would work correctly.
> > > Disadvantage is that multiple copies of vertex dataset are created and
> > > maintained. So this is inefficient in terms of memory.
> > > Another way is to keep topVertices and bottomVertices variables in
> > > BipartiteGraph. vertices variable in Graph would stay empty.
> > getVertices()
> > > function of Graph would be overloaded by BipartiteGraph and
> reimplemented
> > > as union of of topVertices and bottomVertices. Since, vertices is a
> > private
> > > variable, external view of vertices stays unaffected. All functions of
> > > Graph class which use vertices local variable (e.g.
> > getVerticesAsTuple2())
> > > need to use getVertices() instead of this.vertices. Disadvantage of
> this
> > > method is Graph.vertices variable would have invalid value throughout
> for
> > > BipartiteGraph.
> > > Another way is to ‘label’ vertices with an integer. So public class
> > > BipartiteGraph<K,VV,EV> extends Graph<K, Tuple2<VV,Integer>, EV> would
> be
> > > the signature. Initialisers would tag the vertices according to their
> > mode.
> > > getVertices() should be overridden to strip-off the tag values so that
> > > users would get consistent view of vertex dataset for Graph and
> > > BipartiteGraph. getTopVertices/getBottomVertices would be filter
> > functions
> > > on vertex tags.
> > > I personally feel method 2 to be most elegant.
> > > Functions like addVertices(vertexList) are not relevant without
> > specifying
> > > whether they are top or bottom. A possibility could be to add them to
> > > topVertices by default. And have overloaded function
> > > addVertices(vertexList, mode) to add them to specific partition.
> > > Specialised algorithms for Bipartite graphs can implement new
> > > BipartiteGraphAlgorithm interface. It would be similar to
> GraphAlgorithm.
> > > In future, newer types of graphs can be similarly derived from Graph
> and
> > > type hierarchy of Graph can be created. Class hierarchy would be better
> > > here rather than interface hierarchy, because the functions which are
> > > applicable to all derived classes can be implemented in the base class.
> > > As a part of future refactoring, Graph transformation functions should
> > > rather be a new class implementing GraphAlgorithm rather than function
> in
> > > Graph class. This may make implementing complex graph types easier.
> > > PS. Do I need to add it in wiki too? I can copy the contents to wiki if
> > > you say so,
> > > -- Saumitra Shahapure
> > >
> > >
> >
>

Re: Design document for FLINK-2254

Reply via email to