Yes we are aware of this behavior and are working on optimizing it:

https://issues.apache.org/jira/browse/KAFKA-3101

More generally, we are considering to add a "trigger" interface similar to
the Millwheel model where users can customize when they want to emit
outputs to the downstream operators. Unfortunately for now there will no
easy workaround for buffering, and you may want to do this in app code (for
example, in a customized Processor where you can control when to call
context.forward() ).

Guozhang


On Tue, Apr 19, 2016 at 1:40 PM, Jeff Klukas <jklu...@simple.com> wrote:

> Is it true that the aggregation and reduction methods of KStream will emit
> a new output message for each incoming message?
>
> I have an application that's copying a Postgres replication stream to a
> Kafka topic, and activity tends to be clustered, with many updates to a
> given primary key happening in quick succession. I'd like to smooth that
> out by buffering the messages in tumbling windows, allowing the updates to
> overwrite one another, and emitting output messages only at the end of the
> window.
>
> Does the Kafka Streams API provide any hooks that I could use to achieve
> this kind of windowed "buffering" or "deduplication" of a stream?
>



-- 
-- Guozhang

Reply via email to