Hi Marc,

FLINK-7273 updates Gelly's PageRank to optionally include zero-degree vertices 
(the performance cost looks to be significant so this is disabled by default). 

I created FLINK-7277 to work on a weighted PageRank implementation. The greater 
challenge is integrating weighted graphs into the examples runner.

https://issues.apache.org/jira/browse/FLINK-7273
https://issues.apache.org/jira/browse/FLINK-7277

Greg


> On Jul 25, 2017, at 11:30 AM, Kaepke, Marc <marc.kae...@haw-hamburg.de> wrote:
> 
> Hi Greg,
> 
> it seems that it doesn’t matter with the vertex „3“ with no degree.
> I removed these vertex in the graph and in a second test of my input file. 
> The ranking order is still different, and I guess wrong. Furthermore is the 
> sum of all ranks not 1. It depends on the beta-parameter. E.g. a beta of 0.15 
> on the sg PageRank calculate
> (2.0 , 0.38102628032106706)
> (4.0 , 0.4547945998174918)
> (1.0 , 0.4341925979005684)
> 
> The sg and a beta of 0.85 returns:
> (2.0 , 97.53826698457634)
> (4.0 , 140.49741661507886)
> (1.0 , 135.265886297257)
> 
> All of these are issues of vertex-centric, sg and gsa implementation. The 
> last one (without any graph model) works fine.
> 
> Do you have any idea what I doing wrong?
> 
> 
> Marc
> 
>> Am 24.07.2017 um 20:56 schrieb Kaepke, Marc <marc.kae...@haw-hamburg.de 
>> <mailto:marc.kae...@haw-hamburg.de>>:
>> 
>> Thanks for your explanation.
>> 
>> The vertex-centric, sg and gsa PageRank need a Double as vertex value. A 
>> VertexDegree function generate a vertex with a LongValue as value.
>> Maybe I can iterate over the graph and remove all edges with a degree of 
>> zero?!
>> 
>>> Am 24.07.2017 um 16:36 schrieb Greg Hogan <c...@greghogan.com 
>>> <mailto:c...@greghogan.com>>:
>>> 
>>> The current algorithm is unweighted though we should definitely look to add 
>>> a weighted variant and consider PersonalizedPageRank as well.
>>> 
>>> Looking at your results, PageRank scores should sum to 1.0, should be 
>>> positive unless the damping factor is 1.0, and use of the convergence 
>>> threshold will guarantee accurate results on large graphs.
>>> 
>>> The PageRank tests compare results from the NetworkX implementation. The 
>>> missing vertex 3 is trivially fixed by adding the call 
>>> ".setIncludeZeroDegreeVertices(true)” to the VertexDegrees function.
>>> 
>>> 
>>>> On Jul 23, 2017, at 6:38 AM, Kaepke, Marc <marc.kae...@haw-hamburg.de 
>>>> <mailto:marc.kae...@haw-hamburg.de>> wrote:
>>>> 
>>>> Hi Greg,
>>>> 
>>>> I do an evaluation between Gelly and GraphX (Spark). Both frameworks 
>>>> implement PageRank and Gelly provides a lot of variants (*thumbs up*).
>>>> During a really small initial test I get for the vertex-centric, 
>>>> scatter-gather and gsa version the same ranking result. Just the 
>>>> implementation in 1.3.X (without any graph model) computed a different 
>>>> result (ranking).
>>>> 
>>>> /* vertex centric */
>>>> DataSet<Vertex<Double, Double>> pagerankVC = small.run(new PageRank<>(0.5, 
>>>> 10));
>>>> System.err.println("VC");
>>>> pagerankVC.printToErr();
>>>> 
>>>> /* scatter gather */
>>>> DataSet<Vertex<Double, Double>> pageRankSG = small
>>>>     .run(new org.apache.flink.graph.library.PageRank<>(0.5, 10));
>>>> System.err.println("SG");
>>>> pageRankSG.printToErr();
>>>> 
>>>> /* gsa */
>>>> DataSet<Vertex<Double, Double>> pageRankGSA = small.run(new 
>>>> GSAPageRank<>(0.5, 10));
>>>> System.err.println("GSA");
>>>> pageRankGSA.printToErr();
>>>> 
>>>> /* without graph model */
>>>> DataSet<Result<Double>> pageRankDI = small
>>>>     .run(new PageRank<>(0.5, 10));
>>>> System.err.println("delta iteration");
>>>> pageRankDI.printToErr();
>>>> My input graph is:
>>>> vertices
>>>> id 1, val 0
>>>> id 2, val 0
>>>> id 3, val 0
>>>> id 4, val 0
>>>> edges
>>>> src 1, trg 2, val 3
>>>> src 1, trg 1, val 2
>>>> src 2, trg 1, val 3
>>>> src 2, trg 4, val 6
>>>> 
>>>> Ranking output
>>>> vertex-centric
>>>> id 4 with 1.16
>>>> id 1 with 1.103
>>>> id 2 with 0.815
>>>> id 3 with 0
>>>> sg and gsa
>>>> id 4 with 2.208
>>>> id 1 with 2.114
>>>> id 2 with 1.546
>>>> id 3 with 0
>>>> new PageRank in Gelly 1.3.X
>>>> id 1 with 0.392
>>>> id 2 with 0.313
>>>> id 4 with 0.294
>>>> 
>>>> Do you know why?
>>>> 
>>>> 
>>>> Best
>>>> Marc
>>>> 
>>>> 
>>>>> Am 23.07.2017 um 02:22 schrieb Greg Hogan <c...@greghogan.com 
>>>>> <mailto:c...@greghogan.com>>:
>>>>> 
>>>>> Hi Marc,
>>>>> 
>>>>> PageRank and GSAPageRank were moved to the flink-gelly-examples jar in 
>>>>> the org.apache.flink.graph.examples package. A library algorithm was 
>>>>> added that supports both source and sink vertices. This limitation of the 
>>>>> old algorithms was noted in the class documentation and I understand to 
>>>>> be an effect of delta iterations. The new implementation is also 
>>>>> significantly faster 
>>>>> (https://github.com/apache/flink/pull/2733#issuecomment-278789830 
>>>>> <https://github.com/apache/flink/pull/2733#issuecomment-278789830>).
>>>>> 
>>>>> PageRank can be run using the examples jar from the command line, for 
>>>>> example (don’t wildcard the jar file as in the documentation until we get 
>>>>> the javadoc jar removed from the next release).
>>>>> 
>>>>> $ mv opt/flink-gelly* lib/
>>>>> $ ./bin/flink run examples/gelly/flink-gelly-examples_2.11-1.3.1.jar \
>>>>>     --algorithm PageRank \
>>>>>     --input CSV --type integer --simplify directed --input_filename 
>>>>> <filename> --input_field_delimiter $'\t' \
>>>>>     --output print
>>>>> 
>>>>> The output can also be written to CSV in similar fashion to the input.
>>>>> 
>>>>> The code to call the library PageRank from the examples driver is as with 
>>>>> any GraphAlgorithm 
>>>>> (https://github.com/apache/flink/blob/release-1.3/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/PageRank.java
>>>>>  
>>>>> <https://github.com/apache/flink/blob/release-1.3/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/PageRank.java>):
>>>>> 
>>>>> graph.run(new PageRank<K, VV, EV>(dampingFactor, iterations,  
>>>>> convergenceThreshold));
>>>>> 
>>>>> Please let us know of any issues or additional questions!
>>>>> 
>>>>> Greg
>>>>> 
>>>>> 
>>>>>> On Jul 22, 2017, at 4:33 PM, Kaepke, Marc <marc.kae...@haw-hamburg.de 
>>>>>> <mailto:marc.kae...@haw-hamburg.de>> wrote:
>>>>>> 
>>>>>> Hi there,
>>>>>> 
>>>>>> why was the PageRank version (which implements the GraphAlgorithm 
>>>>>> interface) removed in 1.3?
>>>>>> 
>>>>>> How can I use the new PageRank implementation in 1.3.x?
>>>>>> 
>>>>>> Why PageRank doesn’t use the graph processing models (vertex-centric, sg 
>>>>>> or gsa) anymore?
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Bests,
>>>>>> marc
>> 
> 

Reply via email to