Hi Marc, FLINK-7273 updates Gelly's PageRank to optionally include zero-degree vertices (the performance cost looks to be significant so this is disabled by default).
I created FLINK-7277 to work on a weighted PageRank implementation. The greater challenge is integrating weighted graphs into the examples runner. https://issues.apache.org/jira/browse/FLINK-7273 https://issues.apache.org/jira/browse/FLINK-7277 Greg > On Jul 25, 2017, at 11:30 AM, Kaepke, Marc <marc.kae...@haw-hamburg.de> wrote: > > Hi Greg, > > it seems that it doesn’t matter with the vertex „3“ with no degree. > I removed these vertex in the graph and in a second test of my input file. > The ranking order is still different, and I guess wrong. Furthermore is the > sum of all ranks not 1. It depends on the beta-parameter. E.g. a beta of 0.15 > on the sg PageRank calculate > (2.0 , 0.38102628032106706) > (4.0 , 0.4547945998174918) > (1.0 , 0.4341925979005684) > > The sg and a beta of 0.85 returns: > (2.0 , 97.53826698457634) > (4.0 , 140.49741661507886) > (1.0 , 135.265886297257) > > All of these are issues of vertex-centric, sg and gsa implementation. The > last one (without any graph model) works fine. > > Do you have any idea what I doing wrong? > > > Marc > >> Am 24.07.2017 um 20:56 schrieb Kaepke, Marc <marc.kae...@haw-hamburg.de >> <mailto:marc.kae...@haw-hamburg.de>>: >> >> Thanks for your explanation. >> >> The vertex-centric, sg and gsa PageRank need a Double as vertex value. A >> VertexDegree function generate a vertex with a LongValue as value. >> Maybe I can iterate over the graph and remove all edges with a degree of >> zero?! >> >>> Am 24.07.2017 um 16:36 schrieb Greg Hogan <c...@greghogan.com >>> <mailto:c...@greghogan.com>>: >>> >>> The current algorithm is unweighted though we should definitely look to add >>> a weighted variant and consider PersonalizedPageRank as well. >>> >>> Looking at your results, PageRank scores should sum to 1.0, should be >>> positive unless the damping factor is 1.0, and use of the convergence >>> threshold will guarantee accurate results on large graphs. >>> >>> The PageRank tests compare results from the NetworkX implementation. The >>> missing vertex 3 is trivially fixed by adding the call >>> ".setIncludeZeroDegreeVertices(true)” to the VertexDegrees function. >>> >>> >>>> On Jul 23, 2017, at 6:38 AM, Kaepke, Marc <marc.kae...@haw-hamburg.de >>>> <mailto:marc.kae...@haw-hamburg.de>> wrote: >>>> >>>> Hi Greg, >>>> >>>> I do an evaluation between Gelly and GraphX (Spark). Both frameworks >>>> implement PageRank and Gelly provides a lot of variants (*thumbs up*). >>>> During a really small initial test I get for the vertex-centric, >>>> scatter-gather and gsa version the same ranking result. Just the >>>> implementation in 1.3.X (without any graph model) computed a different >>>> result (ranking). >>>> >>>> /* vertex centric */ >>>> DataSet<Vertex<Double, Double>> pagerankVC = small.run(new PageRank<>(0.5, >>>> 10)); >>>> System.err.println("VC"); >>>> pagerankVC.printToErr(); >>>> >>>> /* scatter gather */ >>>> DataSet<Vertex<Double, Double>> pageRankSG = small >>>> .run(new org.apache.flink.graph.library.PageRank<>(0.5, 10)); >>>> System.err.println("SG"); >>>> pageRankSG.printToErr(); >>>> >>>> /* gsa */ >>>> DataSet<Vertex<Double, Double>> pageRankGSA = small.run(new >>>> GSAPageRank<>(0.5, 10)); >>>> System.err.println("GSA"); >>>> pageRankGSA.printToErr(); >>>> >>>> /* without graph model */ >>>> DataSet<Result<Double>> pageRankDI = small >>>> .run(new PageRank<>(0.5, 10)); >>>> System.err.println("delta iteration"); >>>> pageRankDI.printToErr(); >>>> My input graph is: >>>> vertices >>>> id 1, val 0 >>>> id 2, val 0 >>>> id 3, val 0 >>>> id 4, val 0 >>>> edges >>>> src 1, trg 2, val 3 >>>> src 1, trg 1, val 2 >>>> src 2, trg 1, val 3 >>>> src 2, trg 4, val 6 >>>> >>>> Ranking output >>>> vertex-centric >>>> id 4 with 1.16 >>>> id 1 with 1.103 >>>> id 2 with 0.815 >>>> id 3 with 0 >>>> sg and gsa >>>> id 4 with 2.208 >>>> id 1 with 2.114 >>>> id 2 with 1.546 >>>> id 3 with 0 >>>> new PageRank in Gelly 1.3.X >>>> id 1 with 0.392 >>>> id 2 with 0.313 >>>> id 4 with 0.294 >>>> >>>> Do you know why? >>>> >>>> >>>> Best >>>> Marc >>>> >>>> >>>>> Am 23.07.2017 um 02:22 schrieb Greg Hogan <c...@greghogan.com >>>>> <mailto:c...@greghogan.com>>: >>>>> >>>>> Hi Marc, >>>>> >>>>> PageRank and GSAPageRank were moved to the flink-gelly-examples jar in >>>>> the org.apache.flink.graph.examples package. A library algorithm was >>>>> added that supports both source and sink vertices. This limitation of the >>>>> old algorithms was noted in the class documentation and I understand to >>>>> be an effect of delta iterations. The new implementation is also >>>>> significantly faster >>>>> (https://github.com/apache/flink/pull/2733#issuecomment-278789830 >>>>> <https://github.com/apache/flink/pull/2733#issuecomment-278789830>). >>>>> >>>>> PageRank can be run using the examples jar from the command line, for >>>>> example (don’t wildcard the jar file as in the documentation until we get >>>>> the javadoc jar removed from the next release). >>>>> >>>>> $ mv opt/flink-gelly* lib/ >>>>> $ ./bin/flink run examples/gelly/flink-gelly-examples_2.11-1.3.1.jar \ >>>>> --algorithm PageRank \ >>>>> --input CSV --type integer --simplify directed --input_filename >>>>> <filename> --input_field_delimiter $'\t' \ >>>>> --output print >>>>> >>>>> The output can also be written to CSV in similar fashion to the input. >>>>> >>>>> The code to call the library PageRank from the examples driver is as with >>>>> any GraphAlgorithm >>>>> (https://github.com/apache/flink/blob/release-1.3/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/PageRank.java >>>>> >>>>> <https://github.com/apache/flink/blob/release-1.3/flink-libraries/flink-gelly-examples/src/main/java/org/apache/flink/graph/drivers/PageRank.java>): >>>>> >>>>> graph.run(new PageRank<K, VV, EV>(dampingFactor, iterations, >>>>> convergenceThreshold)); >>>>> >>>>> Please let us know of any issues or additional questions! >>>>> >>>>> Greg >>>>> >>>>> >>>>>> On Jul 22, 2017, at 4:33 PM, Kaepke, Marc <marc.kae...@haw-hamburg.de >>>>>> <mailto:marc.kae...@haw-hamburg.de>> wrote: >>>>>> >>>>>> Hi there, >>>>>> >>>>>> why was the PageRank version (which implements the GraphAlgorithm >>>>>> interface) removed in 1.3? >>>>>> >>>>>> How can I use the new PageRank implementation in 1.3.x? >>>>>> >>>>>> Why PageRank doesn’t use the graph processing models (vertex-centric, sg >>>>>> or gsa) anymore? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Bests, >>>>>> marc >> >