Github user vasia commented on the pull request: https://github.com/apache/flink/pull/892#issuecomment-123236061 Hi @shghatge, let me try to explain the implementation in detail here :) First of all, can you change this to a library method instead of an example? The modification should be very easy. You only need to move the class in the `org.apache.flink.graph.library` package and implement the `GraphAlgorithm` interface. This way, users will be able to use this method by simply calling `graph.run(new AdamicAdarSimilarity())`. Making this a library method also means that we don't have to use only Gelly methods, i.e. we can do a few things more efficiently. For example, in the beginning of the algorithm, you need to compute (1) the vertex "weights" and (2) the neighbor IDs for each vertex. Both these computations can be done with a single GroupReduce on the edges dataset, i.e. `edges.flatMap(...).groupBy(0).reduceGroup(...)`, where in the flatMap you simply create the opposite direction edges and in the reduceGroup you compute the neighborhood sets and degrees (size of the set) - weights. The result will be a dataset where each vertex has a `Tuple2` value with its "weight" as the first field and its neighbors as the second. Similarly, instead of using `getTriplets()` (which is convenient but quite expensive), you can compute the partial edge values with a single `groupReduceOnNeighbors`. Say you have vertices `(1, {d1, (2, 3, 4)})`, `(2, {d2, (1, 3)})`, `(3, {d3, (2, 4)})` and `(4, {d4, (1, 3)})`. In `groupReduceOnNeighbors`, vertex 1 will compute the following: - neighbor 2: common neighbor 3 -> emit `(2, 3, d1)` - neighbor 3: common neighbor 4 -> emit `(3, 4, d1)` Finally, you can `groupBy(0, 1)` this result dataset and compute the sums to get the final adamic-adar similarities. Let me know if this makes sense and whether you need more clarifications!
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---