Hi Martin, thanks for your input!
On 7 January 2016 at 09:36, Martin Junghanns <m.jungha...@mailbox.org> wrote: > Hi, > > this would be a very nice addition! I had a glimpse look into the PC > implementation and the two library algorithms and when you get the > idea, it is easy to follow what's happening. The benchmark results are > also very promising. > > I got some questions about partitions: > > 1) I was wondering if the coGroup after the PartitionProcessUDF is aware > of the initial partitioning. To be more precise, is it guaranteed, that > the (vertex-)partitions do not change during the whole iteration and the > RichEdges "stay" on the same worker? > I think you can find the questions to your answers in the attached generated plan [1]. The edges are cached, meaning that the partitions remain on the same workers across iterations. Records from the first coGroup to the mapPartition are forwarded (no shuffling). The output of the mapPartition operator is a tuple of <vertexID, message>, which is sorted and shuffled to reach the destination vertices. > 2) If 1) is true, would it work to call partition(udf) on the vertex > data set before the PC iteration starts? > Do you mean replacing the default hash partitioning with some custom partition function? I guess that'd would work, yes. > 3) If 1) is false, would it work to call partition(udf) after the > coGroup during the delta iteration to achieve partition stability? > > What I got in mind is something like: > > 1 Compute partitions or a set of subgraphs > (via CC, LP, Pattern Matching, ...) > 2 Partition Vertices by computed Partition/Subgraph ID > 3 Compute Algorithm X (Page Rank, BC, SSSP, FSM...) per > Partition/Subgraph via PC iteration > > Again, nice work! > > Best, > Martin > Cheers, -Vasia. [1]: https://drive.google.com/file/d/0BzQJrI2eGlyYSHZJTHFObmgxTXM/view?usp=sharing > > On 06.01.2016 23:29, Vasiliki Kalavri wrote: > > Hi squirrels, > > > > here's some more Gelly iteration love for you ;) > > > > Together with Kien and Nikola (students at KTH, cc-ed), we looked into > > partition-centric iterations for Gelly. The idea of the partition-centric > > model is to expose the graph partition structure to the user (not just a > > single vertex, but all vertices belonging to the same partition at once). > > It allows for local graph structure exploitation inside a partition and > > potentially avoids unnecessary communication. The model was first > > introduced in [1] and we used it as inspiration, but our current > > implementation is closer to the Spargel model. > > > > Kien and Nikola prepared some slides with their design choices and > > experiments [2]. The connected components results are quite impressive :) > > There is still room for optimization, but you can find the wip repository > > in [3] if you're interested. > > > > What do you think about exposing this as a Gelly iteration model? The > task > > has been in the Gelly roadmap and there are certainly performance > > advantages for some algorithms. This implementation can be basically > > considered as a Spargel optimization (existing VertexUpdateFunctions from > > Spargel can actually be re-used). On the other hand, the model is quite > > low-level compared to Spargel and vertex-centric, since users have to > deal > > with inside-partition algorithms themselves (e.g. compare the connected > > components implementation [4] with the existing Spargel/GSA ones). > > > > Thanks! > > -Vasia. > > > > [1]: http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf > > [2]: > > > https://drive.google.com/file/d/0BzQJrI2eGlyYRmlKTjMyVEk1T0U/view?usp=sharing > > [3]: https://github.com/vasia/gelly-partition-centric > > [4]: > > > https://github.com/vasia/gelly-partition-centric/blob/master/src/main/java/org/apache/flink/graph/partition/centric/library/PCConnectedComponents.java > > >