Hi! Sorry for taking so long to react. I am looking through this now as well...
Stephan On Wed, Mar 23, 2016 at 6:59 PM, Lisonbee, Todd <todd.lison...@intel.com> wrote: > I wrote another design for a summarize() function on DataSet. > https://issues.apache.org/jira/browse/FLINK-3664 > > I think this would be a better place for me to start than working on > generic Aggregations. (I could move ahead with it immediately and there > are no tricky decisions if people more or less liked the design). > > Any support for a summarize() function? > > // Summarize a DataSet of Tuples by collecting single pass > statistics for all columns > // example usage: > > Dataset<Tuple3<Double, String, Boolean>> input = // [...] > > Tuple3<DoubleColumnSummary,StringColumnSummary,BooleanColumnSummary> > summary = input.summarize() > summary.getField(0).stddev() > summary.getField(1).maxStringLength() > > Thanks. > > > -----Original Message----- > From: Lisonbee, Todd [mailto:todd.lison...@intel.com] > Sent: Wednesday, March 23, 2016 9:46 AM > To: dev@flink.apache.org > Subject: Aggregation Design Questions > > Hello, > > I'm working on adding Standard Deviation and others to the list of > Aggregations, > https://issues.apache.org/jira/browse/FLINK-3613 > > Unfortunately, I didn't get very far because the general design of > Aggreation on DataSets needs to change and each solution seems to have > drawbacks. For example, one easy solution would be to modify > AggregateOperator to extend CustomUnaryOperation but that seems weird > because then it wouldn't be an Operator. > > I wrote a design explaining some of the current limitations and > background, > https://issues.apache.org/jira/secure/attachment/12794820/DataSet-Aggregation-Design-March2016-v1.txt > > The design is in progress. I wanted to check in with people before going > much further. > > I'd appreciate any feedback. > > Thanks, > > Todd >