Hi,

yes, I was referring to the parallel Boruvka algorithm. There are several
ways to implement this one in Flink and I believe that the one described in
the paper (vertex-centric) is not the most elegant one :)

Andra is now working on an idea that uses the delta iteration abstraction
and we believe that it will be both more efficient and easier to
understand. It has the edges in the solution set and the vertices in the
workset, so it follows the pattern I describe in (2) in my previous e-mail.
As a next step, we would like to see how having an iteration operator that
could update the whole graph -what I describe as (3)- would make this even
nicer.

Any ideas are highly welcome!

Cheers,
V.

On 22 February 2015 at 16:32, Andra Lungu <lungu.an...@gmail.com> wrote:

> Hi Alex,
>
> Vasia is talking about the second version(presented Friday) of Parallel
> Boruvka, which can be found here:
> https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59
>
> I will propose the third, non-Pregel like approach directly to Gelly soon.
>
> If you have additional questions, I will be happy to answer them.
>
> Andra
>
> On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov <
> alexander.s.alexand...@gmail.com> wrote:
>
> > Hi Vasia,
> >
> > I am trying to look at the problem in more detail. Which version of the
> MST
> > are you talking about?
> >
> > Right now in the Gelly repository I can only find the SSSP example
> > (parallel Bellman-Ford) from Section 4.2 in [1].
> >
> > However, it seems that the issues encountered by Andra are related to the
> > implementation of Parallel Boruvka (Section 3.2 in [2]). Is that correct?
> >
> > Regards,
> > A.
> >
> > [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf
> > [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf
> >
> > 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <vasilikikala...@gmail.com>:
> >
> > > Hello beautiful Flink people,
> > >
> > > during the past few days, Andra and I have been discussing about how to
> > > extend Gelly's iteration methods.
> > >
> > > Alexander's course (and his awesome students) has made it obvious that
> > > vertex-centric iterations are not the best fit for algorithms which
> don't
> > > follow the common "propagate-update" pattern. For example, Andra is
> > working
> > > on an implementation of Minimum Spanning Tree, which requires branching
> > > inside an iteration and also requires a convergence check of an
> internal
> > > iteration. Others also reported similar issues [1, 2]. Trying to fit
> such
> > > algorithms to the vertex-centric model leads to long and ugly code,
> e.g.
> > > aggregators to keep track of algorithm phases, duplicating data, etc.
> > >
> > > One limitation of the vertex-centric and the upcoming GAS model is that
> > > they both only allow the vertex values to be updated in each iteration.
> > > However, for some algorithms we need to update the edge values and in
> > > others we need to update both. In even more complex situations (like
> > > Andra's MST) in some iterations we need to update the vertex values and
> > in
> > > some iterations we need to update the edge values.
> > > Another problem is that we currently don't have a way to allow
> different
> > > computational phases inside an iteration. This is something that Giraph
> > > solves with master compute, a function that is executed once before
> each
> > > superstep and sets the computation function.
> > >
> > > All that said, I believe that we can solve most of these issues if we
> > > nicely expose Flink's iteration operators in Gelly. I can see the
> > following
> > > cases:
> > >
> > > 1. Bulk & delta iterations where the solution set is the vertex
> dataset:
> > > this will be similar to vertex-centric and GAS, but will allow more
> > > flexible dataflows inside the iteration.
> > > 2. Bulk & delta iterations where the solution set is the edge dataset:
> > for
> > > the cases where we need to update edge values.
> > > 3. Bulk & delta iterations where the solution set is the Graph: this
> will
> > > cover more complex cases, where the algorithm updates both vertices and
> > > edges or even adds/removes vertices/edges, i.e. updates the whole
> Graph.
> > >
> > > What do you think? I can see 1 & 2 being very easy to implement, but I
> > > suspect 3 won't be that easy (but so awesome to have ^^).
> > > Would it work the way a Graph is represented now, i.e. with 2 DataSets?
> > >
> > > Any comment, idea, pointer would be much appreciated! Thank you ^^
> > >
> > > Cheers,
> > > -V.
> > >
> > > [1]:
> > >
> > >
> >
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html
> > > [2]:
> > >
> > >
> >
> http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769
> > >
> >
>

Reply via email to