[ https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-3511: --------------------------------- Description: Currently we have the following aggregation APIs in the Streams DSL: {code} KStream.aggregateByKey(..) KStream.reduceByKey(..) KStream.countByKey(..) KTable.groupBy(...).aggregate(..) KTable.groupBy(...).reduce(..) KTable.groupBy(...).count(..) {code} And it is better to add common aggregation functions like Sum and Avg as built-in into the Streams DSL. A few questions to ask though: 1. Should we add those built-in functions as, for example {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, ...)}}. Please see the comments below for detailed pros and cons. 2. If we go with the second option above, should we replace the countByKey / count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel it is not necessary, as COUNT is a special aggregate function since we do not need to map on any value fields; this is the same approach as in Spark as well, where Count is built-in as first-citizen in the DSL, and others are built-in as {{aggregate(SUM)}}, etc. was:Currently we only have one built-in aggregate function count() in the Kafka Streams DSL, but we want to add more aggregation functions like sum() and avg(). > Add common aggregation functions like Sum and Avg as build-ins in Kafka > Streams DSL > ----------------------------------------------------------------------------------- > > Key: KAFKA-3511 > URL: https://issues.apache.org/jira/browse/KAFKA-3511 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Guozhang Wang > Assignee: Ishita Mandhan > Labels: api > Fix For: 0.10.1.0 > > > Currently we have the following aggregation APIs in the Streams DSL: > {code} > KStream.aggregateByKey(..) > KStream.reduceByKey(..) > KStream.countByKey(..) > KTable.groupBy(...).aggregate(..) > KTable.groupBy(...).reduce(..) > KTable.groupBy(...).count(..) > {code} > And it is better to add common aggregation functions like Sum and Avg as > built-in into the Streams DSL. A few questions to ask though: > 1. Should we add those built-in functions as, for example > {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, > ...)}}. Please see the comments below for detailed pros and cons. > 2. If we go with the second option above, should we replace the countByKey / > count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel > it is not necessary, as COUNT is a special aggregate function since we do not > need to map on any value fields; this is the same approach as in Spark as > well, where Count is built-in as first-citizen in the DSL, and others are > built-in as {{aggregate(SUM)}}, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)