[jira] [Updated] (FLINK-12671) Summarizer: summary statistics for Table

Xu Yang (JIRA) Fri, 31 May 2019 03:29:01 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-12671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xu Yang updated FLINK-12671:
----------------------------
    Description: 
We provide summary statistics for Table through Summarizer. User can easily get 
the total count and the basic column-wise metrics: max, min, mean, variance, 
standardDeviation, normL1, normL2, the number of missing values and the number 
of valid values.

SparkML has same function, 
[http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]

 

 

Example:

 

String[] colNames = new String[] \{"id", "height", "weight"};

Row[] data = new Row[]{

    Row.of(1, 168, 48.1),

    Row.of(2, 165, 45.8),    

    Row.of(3, 160, 45.3),

    Row.of(4, 163, 41.9),

    Row.of(5, 149, 40.5),

};

Table input = new MemSourceBatchOp(data, colNames).getTable();

TableSummary summary = new Summarizer(input).collectResult();

System.out.println(summary.mean("height")); // print the mean of the 
column(Name: “age”)

System.out.println(summary);

 

 

  was:
We provide summary statistics for Table through Summarizer. User can easily get 
the total count and the basic column-wise metrics: max, min, mean, variance, 
standardDeviation, normL1, normL2, the number of missing values and the number 
of valid values.

SparkML has same function, 
[http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]

 

 

Example:

 

String[] colNames = new String[] \{"id", "height", "weight"};

Row[] data = new Row[] {
 Row.of(1, 168, 48.1),
 Row.of(2, 165, 45.8),
 Row.of(3, 160, 45.3),
 Row.of(4, 163, 41.9),
 Row.of(5, 149, 40.5),
};

Table input = new MemSourceBatchOp(data, colNames).getTable();

TableSummary summary = new Summarizer(input).collectResult();

System.out.println(summary.mean("height")); // print the mean of the 
column(Name: “age”)

System.out.println(summary);

 

 


> Summarizer: summary statistics for Table
> ----------------------------------------
>
>                 Key: FLINK-12671
>                 URL: https://issues.apache.org/jira/browse/FLINK-12671
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Xu Yang
>            Assignee: Xu Yang
>            Priority: Major
>
> We provide summary statistics for Table through Summarizer. User can easily 
> get the total count and the basic column-wise metrics: max, min, mean, 
> variance, standardDeviation, normL1, normL2, the number of missing values and 
> the number of valid values.
> SparkML has same function, 
> [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]
>  
>  
> Example:
>  
> String[] colNames = new String[] \{"id", "height", "weight"};
> Row[] data = new Row[]{
>     Row.of(1, 168, 48.1),
>     Row.of(2, 165, 45.8),    
>     Row.of(3, 160, 45.3),
>     Row.of(4, 163, 41.9),
>     Row.of(5, 149, 40.5),
> };
> Table input = new MemSourceBatchOp(data, colNames).getTable();
> TableSummary summary = new Summarizer(input).collectResult();
> System.out.println(summary.mean("height")); // print the mean of the 
> column(Name: “age”)
> System.out.println(summary);
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12671) Summarizer: summary statistics for Table

Reply via email to