[ https://issues.apache.org/jira/browse/FLINK-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860296#comment-15860296 ]
ASF GitHub Bot commented on FLINK-4896: --------------------------------------- Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2733 @vasia Running on a c4.xlarge with 4 slots and a 4 GB preallocated TaskManager heap. EdgeList measures the time to simplify the graph since the library PageRank using Scatter-Gather ("PageRankSG") and Gather-Sum-Apply ("PageRankGSA") require each vertex to have both incoming and outgoing edges. "PageRank" is the algorithm from this PR. Each execution is performing 10 iterations. Algorithm | Scale 16 | Scale 18 | Scale 20 | Scale 22 ------------ | ------------- | ------------- | ------------- | ------------- EdgeList | 2.537 s | 8.779 s | 34.105 s | 141.512 s PageRank | 9.563 s | 39.558 s | 168.401 s | 740.345 s PageRankSG | 11.188 s | 47.736 s | 216.688 s | 1041.176 s PageRankGSA | 14.001 s | 60.663 s | 268.568 s | 1241.344 s Speedup over SG | 23% | 26% | 35% | 50% I'm surprised that pageRankSG is performing faster than PageRankGSA. The current SG and GSA implementations would vw good examples, and already have integration tests. > PageRank algorithm for directed graphs > -------------------------------------- > > Key: FLINK-4896 > URL: https://issues.apache.org/jira/browse/FLINK-4896 > Project: Flink > Issue Type: New Feature > Components: Gelly > Affects Versions: 1.2.0 > Reporter: Greg Hogan > Assignee: Greg Hogan > > Gelly includes PageRank implementations for scatter-gather and > gather-sum-apply. Both ship with the warning "The implementation assumes that > each page has at least one incoming and one outgoing link." > PageRank is a directed algorithm and sources and sinks are common in directed > graphs. > Sinks drain the total score across the graph which affects convergence and > the balance of the random hop (convergence is not currently a feature of > Gelly's PageRanks as this a very recent feature from FLINK-3888). > Sources are handled nicely by the algorithm highlighted on Flink's features > page under "Iterations and Delta Iterations" since score deltas are > transmitted and a source's score never changes (is always equal to the random > hop probability divided by the vertex count). > https://flink.apache.org/features.html > We should find an implementation featuring convergence and unrestricted > processing of directed graphs and move other implementations to Gelly > examples. -- This message was sent by Atlassian JIRA (v6.3.15#6346)