Some things may not work well as "closed-loop" iterations. Is it possible to express those as for-loop iterations?
On Mon, Feb 23, 2015 at 1:03 PM, Vasiliki Kalavri <vasilikikala...@gmail.com > wrote: > Hi Stephan, > > yes, this would work for the cases where an algorithm only updates the > vertex values or only updates the edge values. > > What we would like to also support is > (a) algorithms where both vertices and edges are updated in one iteration > (b) algorithms where the graph structure changes from one iteration to the > next and > (c) branching inside an iteration, i.e. executing a different "iteration > body" based on some condition. > > We can still currently implement those with regular Flink iteration > operators, but the resulting code is not that nice or efficient. > For example, if we want to update both edges and vertex values, we can > still create a solution set where the vertex values are attached to each > edge. > Regarding branching inside an iteration, we can use an aggregator that > tracks the iteration phase, but then we need to somehow make the different > phases to consist of the same operators and also check the branching > condition inside each UDF. > > Cheers, > V. > > > On 23 February 2015 at 11:05, Stephan Ewen <se...@apache.org> wrote: > > > As a workaround, it should always work to get the Edge and Vertex data > set > > from the graph and use the regular Fink iteration operators? > > > > > > On Sun, Feb 22, 2015 at 4:53 PM, Vasiliki Kalavri < > > vasilikikala...@gmail.com > > > wrote: > > > > > Hi, > > > > > > yes, I was referring to the parallel Boruvka algorithm. There are > several > > > ways to implement this one in Flink and I believe that the one > described > > in > > > the paper (vertex-centric) is not the most elegant one :) > > > > > > Andra is now working on an idea that uses the delta iteration > abstraction > > > and we believe that it will be both more efficient and easier to > > > understand. It has the edges in the solution set and the vertices in > the > > > workset, so it follows the pattern I describe in (2) in my previous > > e-mail. > > > As a next step, we would like to see how having an iteration operator > > that > > > could update the whole graph -what I describe as (3)- would make this > > even > > > nicer. > > > > > > Any ideas are highly welcome! > > > > > > Cheers, > > > V. > > > > > > On 22 February 2015 at 16:32, Andra Lungu <lungu.an...@gmail.com> > wrote: > > > > > > > Hi Alex, > > > > > > > > Vasia is talking about the second version(presented Friday) of > Parallel > > > > Boruvka, which can be found here: > > > > https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59 > > > > > > > > I will propose the third, non-Pregel like approach directly to Gelly > > > soon. > > > > > > > > If you have additional questions, I will be happy to answer them. > > > > > > > > Andra > > > > > > > > On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov < > > > > alexander.s.alexand...@gmail.com> wrote: > > > > > > > > > Hi Vasia, > > > > > > > > > > I am trying to look at the problem in more detail. Which version of > > the > > > > MST > > > > > are you talking about? > > > > > > > > > > Right now in the Gelly repository I can only find the SSSP example > > > > > (parallel Bellman-Ford) from Section 4.2 in [1]. > > > > > > > > > > However, it seems that the issues encountered by Andra are related > to > > > the > > > > > implementation of Parallel Boruvka (Section 3.2 in [2]). Is that > > > correct? > > > > > > > > > > Regards, > > > > > A. > > > > > > > > > > [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf > > > > > [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf > > > > > > > > > > 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri < > > vasilikikala...@gmail.com > > > >: > > > > > > > > > > > Hello beautiful Flink people, > > > > > > > > > > > > during the past few days, Andra and I have been discussing about > > how > > > to > > > > > > extend Gelly's iteration methods. > > > > > > > > > > > > Alexander's course (and his awesome students) has made it obvious > > > that > > > > > > vertex-centric iterations are not the best fit for algorithms > which > > > > don't > > > > > > follow the common "propagate-update" pattern. For example, Andra > is > > > > > working > > > > > > on an implementation of Minimum Spanning Tree, which requires > > > branching > > > > > > inside an iteration and also requires a convergence check of an > > > > internal > > > > > > iteration. Others also reported similar issues [1, 2]. Trying to > > fit > > > > such > > > > > > algorithms to the vertex-centric model leads to long and ugly > code, > > > > e.g. > > > > > > aggregators to keep track of algorithm phases, duplicating data, > > etc. > > > > > > > > > > > > One limitation of the vertex-centric and the upcoming GAS model > is > > > that > > > > > > they both only allow the vertex values to be updated in each > > > iteration. > > > > > > However, for some algorithms we need to update the edge values > and > > in > > > > > > others we need to update both. In even more complex situations > > (like > > > > > > Andra's MST) in some iterations we need to update the vertex > values > > > and > > > > > in > > > > > > some iterations we need to update the edge values. > > > > > > Another problem is that we currently don't have a way to allow > > > > different > > > > > > computational phases inside an iteration. This is something that > > > Giraph > > > > > > solves with master compute, a function that is executed once > before > > > > each > > > > > > superstep and sets the computation function. > > > > > > > > > > > > All that said, I believe that we can solve most of these issues > if > > we > > > > > > nicely expose Flink's iteration operators in Gelly. I can see the > > > > > following > > > > > > cases: > > > > > > > > > > > > 1. Bulk & delta iterations where the solution set is the vertex > > > > dataset: > > > > > > this will be similar to vertex-centric and GAS, but will allow > more > > > > > > flexible dataflows inside the iteration. > > > > > > 2. Bulk & delta iterations where the solution set is the edge > > > dataset: > > > > > for > > > > > > the cases where we need to update edge values. > > > > > > 3. Bulk & delta iterations where the solution set is the Graph: > > this > > > > will > > > > > > cover more complex cases, where the algorithm updates both > vertices > > > and > > > > > > edges or even adds/removes vertices/edges, i.e. updates the whole > > > > Graph. > > > > > > > > > > > > What do you think? I can see 1 & 2 being very easy to implement, > > but > > > I > > > > > > suspect 3 won't be that easy (but so awesome to have ^^). > > > > > > Would it work the way a Graph is represented now, i.e. with 2 > > > DataSets? > > > > > > > > > > > > Any comment, idea, pointer would be much appreciated! Thank you > ^^ > > > > > > > > > > > > Cheers, > > > > > > -V. > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html > > > > > > [2]: > > > > > > > > > > > > > > > > > > > > > > > > > > > http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769 > > > > > > > > > > > > > > > > > > > > >