
Sorry for taking so long to react. I am looking through this now as well...


On Wed, Mar 23, 2016 at 6:59 PM, Lisonbee, Todd <todd.lison...@intel.com>

> I wrote another design for a summarize() function on DataSet.
> https://issues.apache.org/jira/browse/FLINK-3664
> I think this would be a better place for me to start than working on
> generic Aggregations.  (I could move ahead with it immediately and there
> are no tricky decisions if people more or less liked the design).
> Any support for a summarize() function?
>         // Summarize a DataSet of Tuples by collecting single pass
> statistics for all columns
>         // example usage:
>         Dataset<Tuple3<Double, String, Boolean>> input = // [...]
> Tuple3<DoubleColumnSummary,StringColumnSummary,BooleanColumnSummary>
> summary = input.summarize()
>         summary.getField(0).stddev()
>         summary.getField(1).maxStringLength()
> Thanks.
> -----Original Message-----
> From: Lisonbee, Todd [mailto:todd.lison...@intel.com]
> Sent: Wednesday, March 23, 2016 9:46 AM
> To: dev@flink.apache.org
> Subject: Aggregation Design Questions
> Hello,
> I'm working on adding Standard Deviation and others to the list of
> Aggregations,
> https://issues.apache.org/jira/browse/FLINK-3613
> Unfortunately, I didn't get very far because the general design of
> Aggreation on DataSets needs to change and each solution seems to have
> drawbacks.  For example, one easy solution would be to modify
> AggregateOperator to extend CustomUnaryOperation but that seems weird
> because then it wouldn't be an Operator.
> I wrote a design explaining some of the current limitations and
> background,
> https://issues.apache.org/jira/secure/attachment/12794820/DataSet-Aggregation-Design-March2016-v1.txt
> The design is in progress.  I wanted to check in with people before going
> much further.
> I'd appreciate any feedback.
> Thanks,
> Todd

Reply via email to