[ https://issues.apache.org/jira/browse/FLINK-12671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xu Yang updated FLINK-12671: ---------------------------- Description: We provide summary statistics for Table through Summarizer. User can easily get the total count and the basic column-wise metrics: max, min, mean, variance, standardDeviation, normL1, normL2, the number of missing values and the number of valid values. SparkML has same function, [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer] Example: String[] colNames = new String[] \{"id", "height", "weight"}; Row[] data = new Row[]{ Row.of(1, 168, 48.1), Row.of(2, 165, 45.8), Row.of(3, 160, 45.3), Row.of(4, 163, 41.9), Row.of(5, 149, 40.5), }; Table input = new MemSourceBatchOp(data, colNames).getTable(); TableSummary summary = new Summarizer(input).collectResult(); System.out.println(summary.mean("height")); // print the mean of the column(Name: “age”) System.out.println(summary); was: We provide summary statistics for Table through Summarizer. User can easily get the total count and the basic column-wise metrics: max, min, mean, variance, standardDeviation, normL1, normL2, the number of missing values and the number of valid values. SparkML has same function, [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer] Example: String[] colNames = new String[] \{"id", "height", "weight"}; Row[] data = new Row[] { Row.of(1, 168, 48.1), Row.of(2, 165, 45.8), Row.of(3, 160, 45.3), Row.of(4, 163, 41.9), Row.of(5, 149, 40.5), }; Table input = new MemSourceBatchOp(data, colNames).getTable(); TableSummary summary = new Summarizer(input).collectResult(); System.out.println(summary.mean("height")); // print the mean of the column(Name: “age”) System.out.println(summary); > Summarizer: summary statistics for Table > ---------------------------------------- > > Key: FLINK-12671 > URL: https://issues.apache.org/jira/browse/FLINK-12671 > Project: Flink > Issue Type: Sub-task > Reporter: Xu Yang > Assignee: Xu Yang > Priority: Major > > We provide summary statistics for Table through Summarizer. User can easily > get the total count and the basic column-wise metrics: max, min, mean, > variance, standardDeviation, normL1, normL2, the number of missing values and > the number of valid values. > SparkML has same function, > [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer] > > > Example: > > String[] colNames = new String[] \{"id", "height", "weight"}; > Row[] data = new Row[]{ > Row.of(1, 168, 48.1), > Row.of(2, 165, 45.8), > Row.of(3, 160, 45.3), > Row.of(4, 163, 41.9), > Row.of(5, 149, 40.5), > }; > Table input = new MemSourceBatchOp(data, colNames).getTable(); > TableSummary summary = new Summarizer(input).collectResult(); > System.out.println(summary.mean("height")); // print the mean of the > column(Name: “age”) > System.out.println(summary); > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)