[jira] [Commented] (FLINK-3780) Jaccard Similarity

ASF GitHub Bot (JIRA) Fri, 20 May 2016 07:20:57 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293429#comment-15293429
 ]


ASF GitHub Bot commented on FLINK-3780:
---------------------------------------

Github user vasia commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1980#discussion_r64048334
  
    --- Diff: docs/apis/batch/libs/gelly.md ---
    @@ -2250,14 +2250,33 @@ graph.run(new TranslateVertexValues(new 
LongValueAddOffset(vertexCount)));
         </tr>
     
         <tr>
    -      <td>translate.<br/><strong>TranslateEdgeValues</strong></td>
    +      <td>asm.translate.<br/><strong>TranslateEdgeValues</strong></td>
           <td>
             <p>Translate edge values using the given 
<code>TranslateFunction</code>.</p>
     {% highlight java %}
     graph.run(new TranslateEdgeValues(new Nullify()));
     {% endhighlight %}
           </td>
         </tr>
    +
    +    <tr>
    +      <td>library.similarity.<br/><strong>JaccardIndex</strong></td>
    +      <td>
    +        <p>Measures the similarity between vertex neighborhoods. The 
Jaccard Index score  is computed as the number of shared numbers divided by the 
number of distinct neighbors. Scores range from 0.0 (no shared neighbors) to 
1.0 (all neighbors are shared).</p>
    --- End diff --
    
    Why did you add this here and not in the "Usage" section of the library 
method?
    I find it a bit confusing... You describe graph algorithms as building 
blocks for other algorithms. Does Jaccard index fall in this category?


> Jaccard Similarity
> ------------------
>
>                 Key: FLINK-3780
>                 URL: https://issues.apache.org/jira/browse/FLINK-3780
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>             Fix For: 1.1.0
>
>
> Implement a Jaccard Similarity algorithm computing all non-zero similarity 
> scores. This algorithm is similar to {{TriangleListing}} but instead of 
> joining two-paths against an edge list we count two-paths.
> {{flink-gelly-examples}} currently has {{JaccardSimilarityMeasure}} which 
> relies on {{Graph.getTriplets()}} so only computes similarity scores for 
> neighbors but not neighbors-of-neighbors.
> This algorithm is easily modified for other similarity scores such as 
> Adamic-Adar similarity where the sum of endpoint degrees is replaced by the 
> degree of the middle vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3780) Jaccard Similarity

Reply via email to