I see, thanks a lot for the answers! To rephrase my original question, would it make sense to define a closed-loop iteration where the state is the whole graph?
If you want to take a look at the current implementation of DMST using delta iteration, Andra has made a PR [1]. On a high-level, this algorithm does (more or less) the following: while there are more than one vertices in the graph for every vertex select the min-weight edge add all selected edges to the MST collapse vertices with the same root into one vertex (iterative connected components-like step) What we are thinking is that we could express this with -maybe- a bulk-like iteration on the Graph, inside which we can use Gelly methods. Would it make sense or shall we go for a for-loop implementation instead? Thanks! -V. [1]: http://github.com/apache/flink/pull/434 On 23 February 2015 at 16:18, Stephan Ewen <se...@apache.org> wrote: > Closed-loop iterations are much more efficient right now. Long for loops > suffer from memory fragmentation (an issue that is in the list to fix). > > Also, closed loops can be stateful (delta iterations) and do not require > task re-deployment. > > On Mon, Feb 23, 2015 at 4:15 PM, Vasiliki Kalavri < > vasilikikala...@gmail.com > > wrote: > > > I see that's cool :-) > > So, what is the advantage of closed-loop versus for-loop iterations? > > Custom convergence criteria / aggregators and more efficient execution > > plans? > > > > On 23 February 2015 at 15:01, Stephan Ewen <se...@apache.org> wrote: > > > > > For loops are basically rolled out - they yield long execution plans. > > > > > > On Mon, Feb 23, 2015 at 2:44 PM, Vasiliki Kalavri < > > > vasilikikala...@gmail.com > > > > wrote: > > > > > > > for-loop iterations could cover some cases, I guess, when the number > of > > > > iterations is known beforehand. > > > > Are there currently any restrictions on what can be used inside a > > > for-loop? > > > > How are they translated into execution plans? > > > > > > > > On 23 February 2015 at 13:08, Stephan Ewen <se...@apache.org> wrote: > > > > > > > > > Some things may not work well as "closed-loop" iterations. > > > > > > > > > > Is it possible to express those as for-loop iterations? > > > > > > > > > > On Mon, Feb 23, 2015 at 1:03 PM, Vasiliki Kalavri < > > > > > vasilikikala...@gmail.com > > > > > > wrote: > > > > > > > > > > > Hi Stephan, > > > > > > > > > > > > yes, this would work for the cases where an algorithm only > updates > > > the > > > > > > vertex values or only updates the edge values. > > > > > > > > > > > > What we would like to also support is > > > > > > (a) algorithms where both vertices and edges are updated in one > > > > iteration > > > > > > (b) algorithms where the graph structure changes from one > iteration > > > to > > > > > the > > > > > > next and > > > > > > (c) branching inside an iteration, i.e. executing a different > > > > "iteration > > > > > > body" based on some condition. > > > > > > > > > > > > We can still currently implement those with regular Flink > iteration > > > > > > operators, but the resulting code is not that nice or efficient. > > > > > > For example, if we want to update both edges and vertex values, > we > > > can > > > > > > still create a solution set where the vertex values are attached > to > > > > each > > > > > > edge. > > > > > > Regarding branching inside an iteration, we can use an aggregator > > > that > > > > > > tracks the iteration phase, but then we need to somehow make the > > > > > different > > > > > > phases to consist of the same operators and also check the > > branching > > > > > > condition inside each UDF. > > > > > > > > > > > > Cheers, > > > > > > V. > > > > > > > > > > > > > > > > > > On 23 February 2015 at 11:05, Stephan Ewen <se...@apache.org> > > wrote: > > > > > > > > > > > > > As a workaround, it should always work to get the Edge and > Vertex > > > > data > > > > > > set > > > > > > > from the graph and use the regular Fink iteration operators? > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 22, 2015 at 4:53 PM, Vasiliki Kalavri < > > > > > > > vasilikikala...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > yes, I was referring to the parallel Boruvka algorithm. There > > are > > > > > > several > > > > > > > > ways to implement this one in Flink and I believe that the > one > > > > > > described > > > > > > > in > > > > > > > > the paper (vertex-centric) is not the most elegant one :) > > > > > > > > > > > > > > > > Andra is now working on an idea that uses the delta iteration > > > > > > abstraction > > > > > > > > and we believe that it will be both more efficient and easier > > to > > > > > > > > understand. It has the edges in the solution set and the > > vertices > > > > in > > > > > > the > > > > > > > > workset, so it follows the pattern I describe in (2) in my > > > previous > > > > > > > e-mail. > > > > > > > > As a next step, we would like to see how having an iteration > > > > operator > > > > > > > that > > > > > > > > could update the whole graph -what I describe as (3)- would > > make > > > > this > > > > > > > even > > > > > > > > nicer. > > > > > > > > > > > > > > > > Any ideas are highly welcome! > > > > > > > > > > > > > > > > Cheers, > > > > > > > > V. > > > > > > > > > > > > > > > > On 22 February 2015 at 16:32, Andra Lungu < > > lungu.an...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > > > > > > > Vasia is talking about the second version(presented Friday) > > of > > > > > > Parallel > > > > > > > > > Boruvka, which can be found here: > > > > > > > > > https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59 > > > > > > > > > > > > > > > > > > I will propose the third, non-Pregel like approach directly > > to > > > > > Gelly > > > > > > > > soon. > > > > > > > > > > > > > > > > > > If you have additional questions, I will be happy to answer > > > them. > > > > > > > > > > > > > > > > > > Andra > > > > > > > > > > > > > > > > > > On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov < > > > > > > > > > alexander.s.alexand...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > Hi Vasia, > > > > > > > > > > > > > > > > > > > > I am trying to look at the problem in more detail. Which > > > > version > > > > > of > > > > > > > the > > > > > > > > > MST > > > > > > > > > > are you talking about? > > > > > > > > > > > > > > > > > > > > Right now in the Gelly repository I can only find the > SSSP > > > > > example > > > > > > > > > > (parallel Bellman-Ford) from Section 4.2 in [1]. > > > > > > > > > > > > > > > > > > > > However, it seems that the issues encountered by Andra > are > > > > > related > > > > > > to > > > > > > > > the > > > > > > > > > > implementation of Parallel Boruvka (Section 3.2 in [2]). > Is > > > > that > > > > > > > > correct? > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > A. > > > > > > > > > > > > > > > > > > > > [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf > > > > > > > > > > [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf > > > > > > > > > > > > > > > > > > > > 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri < > > > > > > > vasilikikala...@gmail.com > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > Hello beautiful Flink people, > > > > > > > > > > > > > > > > > > > > > > during the past few days, Andra and I have been > > discussing > > > > > about > > > > > > > how > > > > > > > > to > > > > > > > > > > > extend Gelly's iteration methods. > > > > > > > > > > > > > > > > > > > > > > Alexander's course (and his awesome students) has made > it > > > > > obvious > > > > > > > > that > > > > > > > > > > > vertex-centric iterations are not the best fit for > > > algorithms > > > > > > which > > > > > > > > > don't > > > > > > > > > > > follow the common "propagate-update" pattern. For > > example, > > > > > Andra > > > > > > is > > > > > > > > > > working > > > > > > > > > > > on an implementation of Minimum Spanning Tree, which > > > requires > > > > > > > > branching > > > > > > > > > > > inside an iteration and also requires a convergence > check > > > of > > > > an > > > > > > > > > internal > > > > > > > > > > > iteration. Others also reported similar issues [1, 2]. > > > Trying > > > > > to > > > > > > > fit > > > > > > > > > such > > > > > > > > > > > algorithms to the vertex-centric model leads to long > and > > > ugly > > > > > > code, > > > > > > > > > e.g. > > > > > > > > > > > aggregators to keep track of algorithm phases, > > duplicating > > > > > data, > > > > > > > etc. > > > > > > > > > > > > > > > > > > > > > > One limitation of the vertex-centric and the upcoming > GAS > > > > model > > > > > > is > > > > > > > > that > > > > > > > > > > > they both only allow the vertex values to be updated in > > > each > > > > > > > > iteration. > > > > > > > > > > > However, for some algorithms we need to update the edge > > > > values > > > > > > and > > > > > > > in > > > > > > > > > > > others we need to update both. In even more complex > > > > situations > > > > > > > (like > > > > > > > > > > > Andra's MST) in some iterations we need to update the > > > vertex > > > > > > values > > > > > > > > and > > > > > > > > > > in > > > > > > > > > > > some iterations we need to update the edge values. > > > > > > > > > > > Another problem is that we currently don't have a way > to > > > > allow > > > > > > > > > different > > > > > > > > > > > computational phases inside an iteration. This is > > something > > > > > that > > > > > > > > Giraph > > > > > > > > > > > solves with master compute, a function that is executed > > > once > > > > > > before > > > > > > > > > each > > > > > > > > > > > superstep and sets the computation function. > > > > > > > > > > > > > > > > > > > > > > All that said, I believe that we can solve most of > these > > > > issues > > > > > > if > > > > > > > we > > > > > > > > > > > nicely expose Flink's iteration operators in Gelly. I > can > > > see > > > > > the > > > > > > > > > > following > > > > > > > > > > > cases: > > > > > > > > > > > > > > > > > > > > > > 1. Bulk & delta iterations where the solution set is > the > > > > vertex > > > > > > > > > dataset: > > > > > > > > > > > this will be similar to vertex-centric and GAS, but > will > > > > allow > > > > > > more > > > > > > > > > > > flexible dataflows inside the iteration. > > > > > > > > > > > 2. Bulk & delta iterations where the solution set is > the > > > edge > > > > > > > > dataset: > > > > > > > > > > for > > > > > > > > > > > the cases where we need to update edge values. > > > > > > > > > > > 3. Bulk & delta iterations where the solution set is > the > > > > Graph: > > > > > > > this > > > > > > > > > will > > > > > > > > > > > cover more complex cases, where the algorithm updates > > both > > > > > > vertices > > > > > > > > and > > > > > > > > > > > edges or even adds/removes vertices/edges, i.e. updates > > the > > > > > whole > > > > > > > > > Graph. > > > > > > > > > > > > > > > > > > > > > > What do you think? I can see 1 & 2 being very easy to > > > > > implement, > > > > > > > but > > > > > > > > I > > > > > > > > > > > suspect 3 won't be that easy (but so awesome to have > ^^). > > > > > > > > > > > Would it work the way a Graph is represented now, i.e. > > > with 2 > > > > > > > > DataSets? > > > > > > > > > > > > > > > > > > > > > > Any comment, idea, pointer would be much appreciated! > > Thank > > > > you > > > > > > ^^ > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > -V. > > > > > > > > > > > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html > > > > > > > > > > > [2]: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >