Xu Yang created FLINK-12671:
-------------------------------

             Summary: Summarizer: summary statistics for Table
                 Key: FLINK-12671
                 URL: https://issues.apache.org/jira/browse/FLINK-12671
             Project: Flink
          Issue Type: Sub-task
            Reporter: Xu Yang
            Assignee: Xu Yang


We provide summary statistics for Table through Summarizer. User can easily get 
the total count and the basic column-wise metrics: max, min, mean, variance, 
standardDeviation, normL1, normL2, the number of missing values and the number 
of valid values.

SparkML has same function, 
[http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]

 

 

Example:

 

Table input = … 

TableSummary summary = *new* Summarizer(_input_).collectResult();

System.*_out_*.println(summary.mean(*"age"*));  // print the mean of the 
column(Name: “age”)

System.out.println(summary);

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to