[ 
https://issues.apache.org/jira/browse/FLINK-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860296#comment-15860296
 ] 

ASF GitHub Bot commented on FLINK-4896:
---------------------------------------

Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2733
  
    @vasia Running on a c4.xlarge with 4 slots and a 4 GB preallocated 
TaskManager heap. EdgeList measures the time to simplify the graph since the 
library PageRank using Scatter-Gather ("PageRankSG") and Gather-Sum-Apply 
("PageRankGSA") require each vertex to have both incoming and outgoing edges. 
"PageRank" is the algorithm from this PR. Each execution is performing 10 
iterations.
    
    Algorithm | Scale 16 | Scale 18 | Scale 20 | Scale 22
    ------------ | ------------- | ------------- | ------------- | -------------
    EdgeList | 2.537 s | 8.779 s | 34.105 s | 141.512 s
    PageRank | 9.563 s | 39.558 s | 168.401 s | 740.345 s
    PageRankSG | 11.188 s | 47.736 s | 216.688 s | 1041.176 s
    PageRankGSA | 14.001 s | 60.663 s | 268.568 s | 1241.344 s
    Speedup over SG | 23% | 26% | 35% | 50%
    
    I'm surprised that pageRankSG is performing faster than PageRankGSA.
    
    The current SG and GSA implementations would vw good examples, and already 
have integration tests.


> PageRank algorithm for directed graphs
> --------------------------------------
>
>                 Key: FLINK-4896
>                 URL: https://issues.apache.org/jira/browse/FLINK-4896
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.2.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> Gelly includes PageRank implementations for scatter-gather and 
> gather-sum-apply. Both ship with the warning "The implementation assumes that 
> each page has at least one incoming and one outgoing link."
> PageRank is a directed algorithm and sources and sinks are common in directed 
> graphs.
> Sinks drain the total score across the graph which affects convergence and 
> the balance of the random hop (convergence is not currently a feature of 
> Gelly's PageRanks as this a very recent feature from FLINK-3888).
> Sources are handled nicely by the algorithm highlighted on Flink's features 
> page under "Iterations and Delta Iterations" since score deltas are 
> transmitted and a source's score never changes (is always equal to the random 
> hop probability divided by the vertex count).
>   https://flink.apache.org/features.html
> We should find an implementation featuring convergence and unrestricted 
> processing of directed graphs and move other implementations to Gelly 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to