Hi, this would be a very nice addition! I had a glimpse look into the PC implementation and the two library algorithms and when you get the idea, it is easy to follow what's happening. The benchmark results are also very promising.
I got some questions about partitions: 1) I was wondering if the coGroup after the PartitionProcessUDF is aware of the initial partitioning. To be more precise, is it guaranteed, that the (vertex-)partitions do not change during the whole iteration and the RichEdges "stay" on the same worker? 2) If 1) is true, would it work to call partition(udf) on the vertex data set before the PC iteration starts? 3) If 1) is false, would it work to call partition(udf) after the coGroup during the delta iteration to achieve partition stability? What I got in mind is something like: 1 Compute partitions or a set of subgraphs (via CC, LP, Pattern Matching, ...) 2 Partition Vertices by computed Partition/Subgraph ID 3 Compute Algorithm X (Page Rank, BC, SSSP, FSM...) per Partition/Subgraph via PC iteration Again, nice work! Best, Martin On 06.01.2016 23:29, Vasiliki Kalavri wrote: > Hi squirrels, > > here's some more Gelly iteration love for you ;) > > Together with Kien and Nikola (students at KTH, cc-ed), we looked into > partition-centric iterations for Gelly. The idea of the partition-centric > model is to expose the graph partition structure to the user (not just a > single vertex, but all vertices belonging to the same partition at once). > It allows for local graph structure exploitation inside a partition and > potentially avoids unnecessary communication. The model was first > introduced in [1] and we used it as inspiration, but our current > implementation is closer to the Spargel model. > > Kien and Nikola prepared some slides with their design choices and > experiments [2]. The connected components results are quite impressive :) > There is still room for optimization, but you can find the wip repository > in [3] if you're interested. > > What do you think about exposing this as a Gelly iteration model? The task > has been in the Gelly roadmap and there are certainly performance > advantages for some algorithms. This implementation can be basically > considered as a Spargel optimization (existing VertexUpdateFunctions from > Spargel can actually be re-used). On the other hand, the model is quite > low-level compared to Spargel and vertex-centric, since users have to deal > with inside-partition algorithms themselves (e.g. compare the connected > components implementation [4] with the existing Spargel/GSA ones). > > Thanks! > -Vasia. > > [1]: http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf > [2]: > https://drive.google.com/file/d/0BzQJrI2eGlyYRmlKTjMyVEk1T0U/view?usp=sharing > [3]: https://github.com/vasia/gelly-partition-centric > [4]: > https://github.com/vasia/gelly-partition-centric/blob/master/src/main/java/org/apache/flink/graph/partition/centric/library/PCConnectedComponents.java >