[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example

ASF GitHub Bot (JIRA) Thu, 09 Jul 2015 08:37:45 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620693#comment-14620693
 ]


ASF GitHub Bot commented on FLINK-2310:
---------------------------------------

Github user shghatge commented on the pull request:

    https://github.com/apache/flink/pull/892#issuecomment-120036933
  
    Hello @vasia 
    I would like to work on both versions of Adamic Adar. As the JIRA did not 
ask for an approximate version, it was suggested that I create another JIRA 
issue which will provide a library method for Adamic Adar which gives 
approximate solution with the use of bloom filters.
    I have a query about the bloom filters. Since bloom filters only tell us 
whether an element belongs to the set or not, if both the vertices have Bloom 
filters as value, how will we know what to emit? For Example for Vertex 3 
'1,4,13' are set and for Vertex 5 '2,4,13' are set. Now when we use the method 
suggested by you using logical AND we find out the intersection of the Bloom 
Filters. After this do you suggest that we keep another hashtable that keeps 
track of a value->vertex relation? Or do we just emit 5,4,1/log(d3) and keep 
the hashtable as an identity map function? That would mean each vertex has n 
number of bits as value , where n is the number of vertices in the graph. I 
hope I was clear in my query. TL;DR We will have to use an identity hash 
function which implies that each vertex will need n bits of memory as value. Is 
it okay to use this much memory? If there is some other approach then please 
let me know. Bloom filters seem to be more useful in finding size of the 
intersection or union but here we need to know which Vertices are common. The 
only other way that I can roughly imagine is that we get the hashed edges in a 
dataset, just like 5,4,1/(logd3)... Use the same hash function on all the graph 
edges. Then Join the datasets obtained over field 1 and 2. 
    Please tell me if there is any other efficient way or which one of these 
two you would prefer?



> Add an Adamic-Adar Similarity example
> -------------------------------------
>
>                 Key: FLINK-2310
>                 URL: https://issues.apache.org/jira/browse/FLINK-2310
>             Project: Flink
>          Issue Type: Task
>          Components: Gelly
>            Reporter: Andra Lungu
>            Assignee: Shivani Ghatge
>            Priority: Minor
>
> Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a 
> set of nodes. However, instead of counting the common neighbors and dividing 
> them by the total number of neighbors, the similarity is weighted according 
> to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
> The Adamic-Adar algorithm can be broken into three steps: 
> 1). For each vertex, compute the log of its inverse degrees (with the formula 
> above) and set it as the vertex value. 
> 2). Each vertex will then send this new computed value along with a list of 
> neighbors to the targets of its out-edges
> 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of 
> log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is 
> the degree of node n). See [2]
> Prerequisites: 
> - Full understanding of the Jaccard Similarity Measure algorithm
> - Reading the associated literature: 
> [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
> [2] 
> http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2310) Add an Adamic-Adar Similarity example

Reply via email to