Hi Vasia, I am trying to look at the problem in more detail. Which version of the MST are you talking about?
Right now in the Gelly repository I can only find the SSSP example (parallel Bellman-Ford) from Section 4.2 in [1]. However, it seems that the issues encountered by Andra are related to the implementation of Parallel Boruvka (Section 3.2 in [2]). Is that correct? Regards, A. [1] http://www.vldb.org/pvldb/vol7/p1047-han.pdf [2] http://www.vldb.org/pvldb/vol7/p577-salihoglu.pdf 2015-02-19 21:03 GMT+01:00 Vasiliki Kalavri <vasilikikala...@gmail.com>: > Hello beautiful Flink people, > > during the past few days, Andra and I have been discussing about how to > extend Gelly's iteration methods. > > Alexander's course (and his awesome students) has made it obvious that > vertex-centric iterations are not the best fit for algorithms which don't > follow the common "propagate-update" pattern. For example, Andra is working > on an implementation of Minimum Spanning Tree, which requires branching > inside an iteration and also requires a convergence check of an internal > iteration. Others also reported similar issues [1, 2]. Trying to fit such > algorithms to the vertex-centric model leads to long and ugly code, e.g. > aggregators to keep track of algorithm phases, duplicating data, etc. > > One limitation of the vertex-centric and the upcoming GAS model is that > they both only allow the vertex values to be updated in each iteration. > However, for some algorithms we need to update the edge values and in > others we need to update both. In even more complex situations (like > Andra's MST) in some iterations we need to update the vertex values and in > some iterations we need to update the edge values. > Another problem is that we currently don't have a way to allow different > computational phases inside an iteration. This is something that Giraph > solves with master compute, a function that is executed once before each > superstep and sets the computation function. > > All that said, I believe that we can solve most of these issues if we > nicely expose Flink's iteration operators in Gelly. I can see the following > cases: > > 1. Bulk & delta iterations where the solution set is the vertex dataset: > this will be similar to vertex-centric and GAS, but will allow more > flexible dataflows inside the iteration. > 2. Bulk & delta iterations where the solution set is the edge dataset: for > the cases where we need to update edge values. > 3. Bulk & delta iterations where the solution set is the Graph: this will > cover more complex cases, where the algorithm updates both vertices and > edges or even adds/removes vertices/edges, i.e. updates the whole Graph. > > What do you think? I can see 1 & 2 being very easy to implement, but I > suspect 3 won't be that easy (but so awesome to have ^^). > Would it work the way a Graph is represented now, i.e. with 2 DataSets? > > Any comment, idea, pointer would be much appreciated! Thank you ^^ > > Cheers, > -V. > > [1]: > > http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Can-a-master-class-control-the-superstep-in-Flink-Spargel-td733.html > [2]: > > http://issues.apache.org/jira/browse/FLINK-1552?focusedCommentId=14325769&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14325769 >