Hi Martin, > What I got in mind is something like: > 1 Compute partitions or a set of subgraphs > (via CC, LP, Pattern Matching, ...) > 2 Partition Vertices by computed Partition/Subgraph ID > 3 Compute Algorithm X (Page Rank, BC, SSSP, FSM...) per > Partition/Subgraph via PC iteration
I did try this idea at the start of the project, but was unable to store the computed partitions efficiently so I abandoned it. My original approach was storing each partition as an element of a DataSet, but the partition object can get very large and cause big serialization/de-serialization overhead. This approach also breaks if one partition can't be fitted into memory. Best regards, Kien On Thu, Jan 7, 2016 at 5:14 PM, Vasiliki Kalavri <vasilikikala...@gmail.com> wrote: > Hi Martin, > > thanks for your input! > > > On 7 January 2016 at 09:36, Martin Junghanns <m.jungha...@mailbox.org> > wrote: > > > Hi, > > > > this would be a very nice addition! I had a glimpse look into the PC > > implementation and the two library algorithms and when you get the > > idea, it is easy to follow what's happening. The benchmark results are > > also very promising. > > > > I got some questions about partitions: > > > > 1) I was wondering if the coGroup after the PartitionProcessUDF is aware > > of the initial partitioning. To be more precise, is it guaranteed, that > > the (vertex-)partitions do not change during the whole iteration and the > > RichEdges "stay" on the same worker? > > > > > I think you can find the questions to your answers in the attached > generated plan [1]. > The edges are cached, meaning that the partitions remain on the same > workers across iterations. > Records from the first coGroup to the mapPartition are forwarded (no > shuffling). > The output of the mapPartition operator is a tuple of <vertexID, message>, > which is sorted and shuffled to reach the destination vertices. > > > > > 2) If 1) is true, would it work to call partition(udf) on the vertex > > data set before the PC iteration starts? > > > > Do you mean replacing the default hash partitioning with some custom > partition function? I guess that'd would work, yes. > > > > > 3) If 1) is false, would it work to call partition(udf) after the > > coGroup during the delta iteration to achieve partition stability? > > > > What I got in mind is something like: > > > > 1 Compute partitions or a set of subgraphs > > (via CC, LP, Pattern Matching, ...) > > 2 Partition Vertices by computed Partition/Subgraph ID > > 3 Compute Algorithm X (Page Rank, BC, SSSP, FSM...) per > > Partition/Subgraph via PC iteration > > > > Again, nice work! > > > > Best, > > Martin > > > > > > Cheers, > -Vasia. > > [1]: > > https://drive.google.com/file/d/0BzQJrI2eGlyYSHZJTHFObmgxTXM/view?usp=sharing > > > > > > On 06.01.2016 23:29, Vasiliki Kalavri wrote: > > > Hi squirrels, > > > > > > here's some more Gelly iteration love for you ;) > > > > > > Together with Kien and Nikola (students at KTH, cc-ed), we looked into > > > partition-centric iterations for Gelly. The idea of the > partition-centric > > > model is to expose the graph partition structure to the user (not just > a > > > single vertex, but all vertices belonging to the same partition at > once). > > > It allows for local graph structure exploitation inside a partition and > > > potentially avoids unnecessary communication. The model was first > > > introduced in [1] and we used it as inspiration, but our current > > > implementation is closer to the Spargel model. > > > > > > Kien and Nikola prepared some slides with their design choices and > > > experiments [2]. The connected components results are quite impressive > :) > > > There is still room for optimization, but you can find the wip > repository > > > in [3] if you're interested. > > > > > > What do you think about exposing this as a Gelly iteration model? The > > task > > > has been in the Gelly roadmap and there are certainly performance > > > advantages for some algorithms. This implementation can be basically > > > considered as a Spargel optimization (existing VertexUpdateFunctions > from > > > Spargel can actually be re-used). On the other hand, the model is quite > > > low-level compared to Spargel and vertex-centric, since users have to > > deal > > > with inside-partition algorithms themselves (e.g. compare the connected > > > components implementation [4] with the existing Spargel/GSA ones). > > > > > > Thanks! > > > -Vasia. > > > > > > [1]: http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf > > > [2]: > > > > > > https://drive.google.com/file/d/0BzQJrI2eGlyYRmlKTjMyVEk1T0U/view?usp=sharing > > > [3]: https://github.com/vasia/gelly-partition-centric > > > [4]: > > > > > > https://github.com/vasia/gelly-partition-centric/blob/master/src/main/java/org/apache/flink/graph/partition/centric/library/PCConnectedComponents.java > > > > > >