Hi squirrels, here's some more Gelly iteration love for you ;)
Together with Kien and Nikola (students at KTH, cc-ed), we looked into partition-centric iterations for Gelly. The idea of the partition-centric model is to expose the graph partition structure to the user (not just a single vertex, but all vertices belonging to the same partition at once). It allows for local graph structure exploitation inside a partition and potentially avoids unnecessary communication. The model was first introduced in [1] and we used it as inspiration, but our current implementation is closer to the Spargel model. Kien and Nikola prepared some slides with their design choices and experiments [2]. The connected components results are quite impressive :) There is still room for optimization, but you can find the wip repository in [3] if you're interested. What do you think about exposing this as a Gelly iteration model? The task has been in the Gelly roadmap and there are certainly performance advantages for some algorithms. This implementation can be basically considered as a Spargel optimization (existing VertexUpdateFunctions from Spargel can actually be re-used). On the other hand, the model is quite low-level compared to Spargel and vertex-centric, since users have to deal with inside-partition algorithms themselves (e.g. compare the connected components implementation [4] with the existing Spargel/GSA ones). Thanks! -Vasia. [1]: http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf [2]: https://drive.google.com/file/d/0BzQJrI2eGlyYRmlKTjMyVEk1T0U/view?usp=sharing [3]: https://github.com/vasia/gelly-partition-centric [4]: https://github.com/vasia/gelly-partition-centric/blob/master/src/main/java/org/apache/flink/graph/partition/centric/library/PCConnectedComponents.java