[ https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298244#comment-15298244 ]
ASF GitHub Bot commented on KAFKA-3511: --------------------------------------- GitHub user enothereska opened a pull request: https://github.com/apache/kafka/pull/1424 KAFKA-3511: Initial commit for aggregators [WiP] Initial structure. Removed initialiser. Two simple aggregators. You can merge this pull request into a Git repository by running: $ git pull https://github.com/enothereska/kafka KAFKA-3511-sum-avg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1424.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1424 ---- commit 18416bb213b6eaa3fa5952af67dc5396204e247c Author: Eno Thereska <eno.there...@gmail.com> Date: 2016-05-24T14:25:47Z Initial commit for aggregators ---- > Add common aggregation functions like Sum and Avg as build-ins in Kafka > Streams DSL > ----------------------------------------------------------------------------------- > > Key: KAFKA-3511 > URL: https://issues.apache.org/jira/browse/KAFKA-3511 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Guozhang Wang > Assignee: Eno Thereska > Labels: api > Fix For: 0.10.1.0 > > > Currently we have the following aggregation APIs in the Streams DSL: > {code} > KStream.aggregateByKey(..) > KStream.reduceByKey(..) > KStream.countByKey(..) > KTable.groupBy(...).aggregate(..) > KTable.groupBy(...).reduce(..) > KTable.groupBy(...).count(..) > {code} > And it is better to add common aggregation functions like Sum and Avg as > built-in into the Streams DSL. A few questions to ask though: > 1. Should we add those built-in functions as, for example > {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, > ...)}}. Please see the comments below for detailed pros and cons. > 2. If we go with the second option above, should we replace the countByKey / > count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel > it is not necessary, as COUNT is a special aggregate function since we do not > need to map on any value fields; this is the same approach as in Spark as > well, where Count is built-in as first-citizen in the DSL, and others are > built-in as {{aggregate(SUM)}}, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)