Hi Alex, Vasia is talking about the second version(presented Friday) of Parallel Boruvka, which can be found here: https://github.com/TU-Berlin-DIMA/IMPRO-3.WS14/pull/59
I will propose the third, non-Pregel like approach directly to Gelly soon. If you have additional questions, I will be happy to answer them. Andra On Sun, Feb 22, 2015 at 4:23 PM, Alexander Alexandrov < alexander.s.alexand...@gmail.com> wrote: > Hi Vasia, > > I am trying to look at the problem in more detail. Which version of the MST > are you talking about? > > Right now in the Gelly repository I can only find the SSSP example > (parallel Bellman-Ford) from Section 4.2 in [1]. > > However, it seems that the issues encountered by Andra are related to the > implementation of Parallel Boruvka (Section 3.2 in [2]). Is that correct? > > Regards, > A. > > [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf > [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf > > 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <vasilikikala...@gmail.com>: > > > Hello beautiful Flink people, > > > > during the past few days, Andra and I have been discussing about how to > > extend Gelly's iteration methods. > > > > Alexander's course (and his awesome students) has made it obvious that > > vertex-centric iterations are not the best fit for algorithms which don't > > follow the common "propagate-update" pattern. For example, Andra is > working > > on an implementation of Minimum Spanning Tree, which requires branching > > inside an iteration and also requires a convergence check of an internal > > iteration. Others also reported similar issues [1, 2]. Trying to fit such > > algorithms to the vertex-centric model leads to long and ugly code, e.g. > > aggregators to keep track of algorithm phases, duplicating data, etc. > > > > One limitation of the vertex-centric and the upcoming GAS model is that > > they both only allow the vertex values to be updated in each iteration. > > However, for some algorithms we need to update the edge values and in > > others we need to update both. In even more complex situations (like > > Andra's MST) in some iterations we need to update the vertex values and > in > > some iterations we need to update the edge values. > > Another problem is that we currently don't have a way to allow different > > computational phases inside an iteration. This is something that Giraph > > solves with master compute, a function that is executed once before each > > superstep and sets the computation function. > > > > All that said, I believe that we can solve most of these issues if we > > nicely expose Flink's iteration operators in Gelly. I can see the > following > > cases: > > > > 1. Bulk & delta iterations where the solution set is the vertex dataset: > > this will be similar to vertex-centric and GAS, but will allow more > > flexible dataflows inside the iteration. > > 2. Bulk & delta iterations where the solution set is the edge dataset: > for > > the cases where we need to update edge values. > > 3. Bulk & delta iterations where the solution set is the Graph: this will > > cover more complex cases, where the algorithm updates both vertices and > > edges or even adds/removes vertices/edges, i.e. updates the whole Graph. > > > > What do you think? I can see 1 & 2 being very easy to implement, but I > > suspect 3 won't be that easy (but so awesome to have ^^). > > Would it work the way a Graph is represented now, i.e. with 2 DataSets? > > > > Any comment, idea, pointer would be much appreciated! Thank you ^^ > > > > Cheers, > > -V. > > > > [1]: > > > > > http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html > > [2]: > > > > > http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769 > > >