Flávio, thanks for creating this KIP.

I think this "single-aggregation" use case is common enough that we should
consider how to efficiently supports it: for example, for KSQL that's built
on top of Streams, we've seen lots of query statements whose return is
expected a single row indicating the "total aggregate" etc. See
https://github.com/confluentinc/ksql/issues/430 for details.

I've not read through https://issues.apache.org/jira/browse/KAFKA-6953, but
I'm wondering if we have discussed the option of supporting it in a
"pre-aggregate" manner: that is we do partial aggregates on parallel tasks,
and then sends the partial aggregated value via a single topic partition
for the final aggregate, to reduce the traffic on that single partition and
hence the final aggregate workload.
Of course, for non-commutative aggregates we'd probably need to provide
another API in addition to aggregate, like the `merge` function for
session-based aggregates, to let users customize the operations of merging
two partial aggregates into a single partial aggregate. What's its pros and
cons compared with the current proposal?


Guozhang

On Mon, Jun 25, 2018 at 3:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> This would be useful feature.
>
> In the Public Interfaces section, the new method lacks a closing
> parenthesis.
>
> In the Proposed Changes section, if the order of the 3 bullets can match
> the order of the parameters of the new method, it would be easier to read.
>
> For Rejected Alternatives #2, can you add a sentence saying why it was
> rejected ?
>
> Cheers
>
> On Mon, Jun 25, 2018 at 10:13 AM, Flávio Stutz <flaviost...@gmail.com>
> wrote:
>
> > Hey, guys, I've just started a KIP discussion here:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 323%3A+Schedulable+KTable+as+Graph+source
> >
>



-- 
-- Guozhang

Reply via email to