Hi Vasia,

I shall polish the functions a bit, but this is more or less what I had in
mind:
GSA Jaccard [what we have in Gelly right now]:
https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/GSAJaccardSimilarityMeasure.java
The same version with node split:
https://github.com/andralungu/gelly-partitioning/blob/master/src/main/java/example/NodeSplittingGSAJaccard.java

Yes, sure I can also share some charts with the results. I could say for
which data sets (power law) and types of algorithms this can be used. If
it's used appropriately, the overhead is 0.

I'm open to suggestions!
Thanks!
Andra


On Tue, Aug 11, 2015 at 2:45 PM, Vasiliki Kalavri <vasilikikala...@gmail.com
> wrote:

> Hi Andra,
>
> thanks for offering to add this work to Gelly and for starting the
> discussion!
>
> How do you think this would look like from an API point of view? Is it easy
> to make it transparent to the application? Could you give us a simple
> example of what you have in mind?
>
> Apart from usability, we should also consider documenting when to use this
> feature, i.e. for which algorithms, what kind degree distribution and what
> overhead to expect (if any).
>
> Cheers,
> Vasia.
> On Aug 10, 2015 10:47 AM, "Andra Lungu" <lungu.an...@gmail.com> wrote:
>
> > Hey,
> >
> > Before actually opening a PR, I wanted to hear your opinion. So, here
> goes
> > nothing :).
> >
> > I'd like to add the core of my master thesis to Gelly. That is, a series
> of
> > operators that take a skewed graph, split its high degree vertices into
> > subvertices and redistribute the edges accordingly (thus producing the
> > result faster due to the increased parallelism); then merge the
> subvertices
> > into the initial vertex.
> >
> > Here's part of my -unrevised- abstract:
> > "The Twitter follower graph, the citation network and general purpose
> > social networks follow a power law degree distribution. These highly
> skewed
> > graphs raise challenges to current computation models which uniformly
> > process vertices, leading to communication overhead or large execution
> time
> > stalls.
> >
> > In spite of efforts made to scale up computation on natural graphs
> > (PowerGraph ’s Gather - Apply - Scatter model), many performance problems
> > raise from a subset of vertices that have high degrees. In this thesis,
> we
> > outline the main processing issues that arise in current graph systems.
> We
> > then propose a novel node splitting technique meant to differentiate the
> > computation as well as partitioning strategies for low and high degree
> > nodes.
> >
> > The “Node Splitting” abstraction is implemented as a separate set of
> > operators that can be seamlessly combined with current degree-oblivious
> > models such as Pregel and PowerGraph, but also with degree-aware engines
> > such as Pregel+ and PowerLyra. Our solution introduces minimal overhead
> in
> > the absence of skew and improves the naive implementation of a subset of
> > computational intensive algorithms by a factor of two.
> >
> > These results have been proven on a theoretical and practical level on
> both
> > real world and synthetic graphs."
> >
> > What I would add:
> > 1) the operators per se in a separate package
> > 2) a slightly modified example (e.g. take Jaccard Similarity and apply
> > these operators on it)
> > 3). tests
> > 4). a separate section in the gelly docs that explains how to modify your
> > code to use this solution.
> >
> > Needless to say I will maintain the code and, if, as we get more users,
> > they deem this sub-API useless, we can always deprecate and delete it :).
> >
> > So, what do you think?
> > Andra
> >
>

Reply via email to