Re: [DISCUSS] Gelly iteration abstractions

Andra Lungu Sun, 22 Feb 2015 07:36:06 -0800

Hi Alex,

Vasia is talking about the second version(presented Friday) of Parallel
Boruvka, which can be found here:
https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59


I will propose the third, non-Pregel like approach directly to Gelly soon.

If you have additional questions, I will be happy to answer them.

Andra

On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov <
alexander.s.alexand...@gmail.com> wrote:

> Hi Vasia,
>
> I am trying to look at the problem in more detail. Which version of the MST
> are you talking about?
>
> Right now in the Gelly repository I can only find the SSSP example
> (parallel Bellman-Ford) from Section 4.2 in [1].
>
> However, it seems that the issues encountered by Andra are related to the
> implementation of Parallel Boruvka (Section 3.2 in [2]). Is that correct?
>
> Regards,
> A.
>
> [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf
> [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf
>
> 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <vasilikikala...@gmail.com>:
>
> > Hello beautiful Flink people,
> >
> > during the past few days, Andra and I have been discussing about how to
> > extend Gelly's iteration methods.
> >
> > Alexander's course (and his awesome students) has made it obvious that
> > vertex-centric iterations are not the best fit for algorithms which don't
> > follow the common "propagate-update" pattern. For example, Andra is
> working
> > on an implementation of Minimum Spanning Tree, which requires branching
> > inside an iteration and also requires a convergence check of an internal
> > iteration. Others also reported similar issues [1, 2]. Trying to fit such
> > algorithms to the vertex-centric model leads to long and ugly code, e.g.
> > aggregators to keep track of algorithm phases, duplicating data, etc.
> >
> > One limitation of the vertex-centric and the upcoming GAS model is that
> > they both only allow the vertex values to be updated in each iteration.
> > However, for some algorithms we need to update the edge values and in
> > others we need to update both. In even more complex situations (like
> > Andra's MST) in some iterations we need to update the vertex values and
> in
> > some iterations we need to update the edge values.
> > Another problem is that we currently don't have a way to allow different
> > computational phases inside an iteration. This is something that Giraph
> > solves with master compute, a function that is executed once before each
> > superstep and sets the computation function.
> >
> > All that said, I believe that we can solve most of these issues if we
> > nicely expose Flink's iteration operators in Gelly. I can see the
> following
> > cases:
> >
> > 1. Bulk & delta iterations where the solution set is the vertex dataset:
> > this will be similar to vertex-centric and GAS, but will allow more
> > flexible dataflows inside the iteration.
> > 2. Bulk & delta iterations where the solution set is the edge dataset:
> for
> > the cases where we need to update edge values.
> > 3. Bulk & delta iterations where the solution set is the Graph: this will
> > cover more complex cases, where the algorithm updates both vertices and
> > edges or even adds/removes vertices/edges, i.e. updates the whole Graph.
> >
> > What do you think? I can see 1 & 2 being very easy to implement, but I
> > suspect 3 won't be that easy (but so awesome to have ^^).
> > Would it work the way a Graph is represented now, i.e. with 2 DataSets?
> >
> > Any comment, idea, pointer would be much appreciated! Thank you ^^
> >
> > Cheers,
> > -V.
> >
> > [1]:
> >
> >
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html
> > [2]:
> >
> >
> http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769
> >
>

Re: [DISCUSS] Gelly iteration abstractions

Reply via email to