[ 
https://issues.apache.org/jira/browse/FLINK-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062215#comment-15062215
 ] 

ASF GitHub Bot commented on FLINK-2716:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1462#discussion_r47921105
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
    @@ -394,6 +396,21 @@ public long count() throws Exception {
                return res.<Long> getAccumulatorResult(id);
        }
     
    +   /**
    +    * Convenience method to get the count (number of elements) of a DataSet
    +    * as well as the checksum (sum over element hashes).
    +    *
    +    * @return A Checksum that represents the count and checksum of 
elements in the data set.
    +    */
    +   public Checksum checksum() throws Exception {
    +           final String id = new AbstractID().toString();
    +
    +           flatMap(new Utils.ChecksumHelper<T>(id)).name("checksum()")
    +                           .output(new 
DiscardingOutputFormat<NullValue>()).name("checksum() sink");
    --- End diff --
    
    Saves one operator and source of confusion in the UI. Actually, the 
`collect()` and `count()` should be similarly simplified, come to think of it 
;-)


> Checksum method for DataSet and Graph
> -------------------------------------
>
>                 Key: FLINK-2716
>                 URL: https://issues.apache.org/jira/browse/FLINK-2716
>             Project: Flink
>          Issue Type: Improvement
>          Components: Gelly, Java API, Scala API
>    Affects Versions: 0.10.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
>
> {{DataSet.count()}}, {{Graph.numberOfVertices()}}, and 
> {{Graph.numberOfEdges()}} provide measures of the number of distributed data 
> elements. New {{DataSet.checksum()}} and {{Graph.checksum()}} methods will 
> summarize the content of data elements and support algorithm validation, 
> integration testing, and benchmarking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to