Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/892#issuecomment-123236061
  
    Hi @shghatge,
    
    let me try to explain the implementation in detail here :)
    
    First of all, can you change this to a library method instead of an 
example? The modification should be very easy. You only need to move the class 
in the `org.apache.flink.graph.library` package and implement the 
`GraphAlgorithm` interface. This way, users will be able to use this method by 
simply calling `graph.run(new AdamicAdarSimilarity())`.
    
    Making this a library method also means that we don't have to use only 
Gelly methods, i.e. we can do a few things more efficiently.
    
    For example, in the beginning of the algorithm, you need to compute (1) the 
vertex "weights" and (2) the neighbor IDs for each vertex. Both these 
computations can be done with a single GroupReduce on the edges dataset, i.e. 
`edges.flatMap(...).groupBy(0).reduceGroup(...)`, where in the flatMap you 
simply create the opposite direction edges and in the reduceGroup you compute 
the neighborhood sets and degrees (size of the set) - weights.
    The result will be a dataset where each vertex has a `Tuple2` value with 
its "weight" as the first field and its neighbors as the second.
    
    Similarly, instead of using `getTriplets()` (which is convenient but quite 
expensive), you can compute the partial edge values with a single 
`groupReduceOnNeighbors`. 
    Say you have vertices `(1, {d1, (2, 3, 4)})`, `(2, {d2, (1, 3)})`, `(3, 
{d3, (2, 4)})` and `(4, {d4, (1, 3)})`. 
    In `groupReduceOnNeighbors`, vertex 1 will compute the following:
    - neighbor 2: common neighbor 3 -> emit `(2, 3, d1)`
    - neighbor 3: common neighbor 4 -> emit `(3, 4, d1)`
    
    Finally, you can `groupBy(0, 1)` this result dataset and compute the sums 
to get the final adamic-adar similarities.
    
    Let me know if this makes sense and whether you need more clarifications!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to