[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279918#comment-15279918 ]
Vasia Kalavri commented on FLINK-3879: -------------------------------------- Gelly has multiple implementations for some algorithms to showcase how the different iteration abstractions can be used. Also, for some graph inputs an implementation might perform better than another (e.g. scatter-gather vs gsa). That doesn't mean we should add multiple implementations for all new algorithms :) Now regarding performance, I'm not quite sure that FLINK-3879 will perform better than FLINK-2044. I haven't looked at the PR in detail, but I saw that it uses a bulk iteration. That means that a new partial solution is generated in every iteration and we cannot take advantage of the asymmetric convergence (if any). > Native implementation of HITS algorithm > --------------------------------------- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly > Affects Versions: 1.1.0 > Reporter: Greg Hogan > Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)