Re: [gelly] partition-centric iterations

Truong Duc Kien Fri, 08 Jan 2016 03:31:21 -0800

Hi Martin,

> What I got in mind is something like:
> 1 Compute partitions or a set of subgraphs
>  (via CC, LP, Pattern Matching, ...)
> 2 Partition Vertices by computed Partition/Subgraph ID
> 3 Compute Algorithm X (Page Rank, BC, SSSP, FSM...) per
> Partition/Subgraph via PC iteration


I did try this idea at the start of the project, but was unable to
store the computed partitions efficiently so I abandoned it.
My original approach was storing each partition as an element
of a DataSet, but the partition object can get very large and
cause big serialization/de-serialization overhead.
This approach also breaks if one partition can't be fitted into memory.

Best regards,
Kien


On Thu, Jan 7, 2016 at 5:14 PM, Vasiliki Kalavri <vasilikikala...@gmail.com>
wrote:

> Hi Martin,
>
> thanks for your input!
>
>
> On 7 January 2016 at 09:36, Martin Junghanns <m.jungha...@mailbox.org>
> wrote:
>
> > Hi,
> >
> > this would be a very nice addition! I had a glimpse look into the PC
> > implementation and the two library algorithms and when you get the
> > idea, it is easy to follow what's happening. The benchmark results are
> > also very promising.
> >
> > I got some questions about partitions:
> >
> > 1) I was wondering if the coGroup after the PartitionProcessUDF is aware
> > of the initial partitioning. To be more precise, is it guaranteed, that
> > the (vertex-)partitions do not change during the whole iteration and the
> > RichEdges "stay" on the same worker?
> >
>
>
> I think you can find the questions to your answers in the attached
> generated plan [1].
> The edges are cached, meaning that the partitions remain on the same
> workers across iterations.
> Records from the first coGroup to the mapPartition are forwarded (no
> shuffling).
> The output of the mapPartition operator is a tuple of <vertexID, message>,
> which is sorted and shuffled to reach the destination vertices.
>
>
>
> > 2) If 1) is true, would it work to call partition(udf) on the vertex
> > data set before the PC iteration starts?
> >
>
> Do you mean replacing the default hash partitioning with some custom
> partition function? I guess that'd would work, yes.
>
>
>
> > 3) If 1) is false, would it work to call partition(udf) after the
> > coGroup during the delta iteration to achieve partition stability?
> >
> > What I got in mind is something like:
> >
> > 1 Compute partitions or a set of subgraphs
> >   (via CC, LP, Pattern Matching, ...)
> > 2 Partition Vertices by computed Partition/Subgraph ID
> > 3 Compute Algorithm X (Page Rank, BC, SSSP, FSM...) per
> > Partition/Subgraph via PC iteration
> >
> > Again, nice work!
> >
> > Best,
> > Martin
> >
>
>
>
> Cheers,
> -Vasia.
>
> [1]:
>
> https://drive.google.com/file/d/0BzQJrI2eGlyYSHZJTHFObmgxTXM/view?usp=sharing
>
>
> >
> > On 06.01.2016 23:29, Vasiliki Kalavri wrote:
> > > Hi squirrels,
> > >
> > > here's some more Gelly iteration love for you ;)
> > >
> > > Together with Kien and Nikola (students at KTH, cc-ed), we looked into
> > > partition-centric iterations for Gelly. The idea of the
> partition-centric
> > > model is to expose the graph partition structure to the user (not just
> a
> > > single vertex, but all vertices belonging to the same partition at
> once).
> > > It allows for local graph structure exploitation inside a partition and
> > > potentially avoids unnecessary communication. The model was first
> > > introduced in [1] and we used it as inspiration, but our current
> > > implementation is closer to the Spargel model.
> > >
> > > Kien and Nikola prepared some slides with their design choices and
> > > experiments [2]. The connected components results are quite impressive
> :)
> > > There is still room for optimization, but you can find the wip
> repository
> > > in [3] if you're interested.
> > >
> > > What do you think about exposing this as a Gelly iteration model? The
> > task
> > > has been in the Gelly roadmap and there are certainly performance
> > > advantages for some algorithms. This implementation can be basically
> > > considered as a Spargel optimization (existing VertexUpdateFunctions
> from
> > > Spargel can actually be re-used). On the other hand, the model is quite
> > > low-level compared to Spargel and vertex-centric, since users have to
> > deal
> > > with inside-partition algorithms themselves (e.g. compare the connected
> > > components implementation [4] with the existing Spargel/GSA ones).
> > >
> > > Thanks!
> > > -Vasia.
> > >
> > > [1]: http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf
> > > [2]:
> > >
> >
> https://drive.google.com/file/d/0BzQJrI2eGlyYRmlKTjMyVEk1T0U/view?usp=sharing
> > > [3]: https://github.com/vasia/gelly-partition-centric
> > > [4]:
> > >
> >
> https://github.com/vasia/gelly-partition-centric/blob/master/src/main/java/org/apache/flink/graph/partition/centric/library/PCConnectedComponents.java
> > >
> >
>

Re: [gelly] partition-centric iterations

Reply via email to