Hi,

this would be a very nice addition! I had a glimpse look into the PC
implementation and the two library algorithms and when you get the
idea, it is easy to follow what's happening. The benchmark results are
also very promising.

I got some questions about partitions:

1) I was wondering if the coGroup after the PartitionProcessUDF is aware
of the initial partitioning. To be more precise, is it guaranteed, that
the (vertex-)partitions do not change during the whole iteration and the
RichEdges "stay" on the same worker?
2) If 1) is true, would it work to call partition(udf) on the vertex
data set before the PC iteration starts?
3) If 1) is false, would it work to call partition(udf) after the
coGroup during the delta iteration to achieve partition stability?

What I got in mind is something like:

1 Compute partitions or a set of subgraphs
  (via CC, LP, Pattern Matching, ...)
2 Partition Vertices by computed Partition/Subgraph ID
3 Compute Algorithm X (Page Rank, BC, SSSP, FSM...) per
Partition/Subgraph via PC iteration

Again, nice work!

Best,
Martin

On 06.01.2016 23:29, Vasiliki Kalavri wrote:
> Hi squirrels,
> 
> here's some more Gelly iteration love for you ;)
> 
> Together with Kien and Nikola (students at KTH, cc-ed), we looked into
> partition-centric iterations for Gelly. The idea of the partition-centric
> model is to expose the graph partition structure to the user (not just a
> single vertex, but all vertices belonging to the same partition at once).
> It allows for local graph structure exploitation inside a partition and
> potentially avoids unnecessary communication. The model was first
> introduced in [1] and we used it as inspiration, but our current
> implementation is closer to the Spargel model.
> 
> Kien and Nikola prepared some slides with their design choices and
> experiments [2]. The connected components results are quite impressive :)
> There is still room for optimization, but you can find the wip repository
> in [3] if you're interested.
> 
> What do you think about exposing this as a Gelly iteration model? The task
> has been in the Gelly roadmap and there are certainly performance
> advantages for some algorithms. This implementation can be basically
> considered as a Spargel optimization (existing VertexUpdateFunctions from
> Spargel can actually be re-used). On the other hand, the model is quite
> low-level compared to Spargel and vertex-centric, since users have to deal
> with inside-partition algorithms themselves (e.g. compare the connected
> components implementation [4] with the existing Spargel/GSA ones).
> 
> Thanks!
> -Vasia.
> 
> [1]: http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf
> [2]:
> https://drive.google.com/file/d/0BzQJrI2eGlyYRmlKTjMyVEk1T0U/view?usp=sharing
> [3]: https://github.com/vasia/gelly-partition-centric
> [4]:
> https://github.com/vasia/gelly-partition-centric/blob/master/src/main/java/org/apache/flink/graph/partition/centric/library/PCConnectedComponents.java
> 

Reply via email to