Re: How to "buffer" a stream with high churn and output only at the end of a window?

Guozhang Wang Wed, 20 Apr 2016 15:18:15 -0700

Consumer' buffer does not depend on offset committing, once it is given
from the poll() call it is out of the buffer. If offsets are not committed,
then upon failover it will simply re-consumer these records again from
Kafka.


Guozhang

On Tue, Apr 19, 2016 at 11:34 PM, Henry Cai <h...@pinterest.com.invalid>
wrote:

> For the technique of custom Processor of holding call to context.forward(),
> if I hold it for 10 minutes, what does that mean for the consumer
> acknowledgement on source node?
>
> I guess if I hold it for 10 minutes, the consumer is not going to ack to
> the upstream queue, will that impact the consumer performance, will
> consumer's kafka client message buffer overflow when there is no ack in 10
> minutes?
>
>
> On Tue, Apr 19, 2016 at 6:10 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Yes we are aware of this behavior and are working on optimizing it:
> >
> > https://issues.apache.org/jira/browse/KAFKA-3101
> >
> > More generally, we are considering to add a "trigger" interface similar
> to
> > the Millwheel model where users can customize when they want to emit
> > outputs to the downstream operators. Unfortunately for now there will no
> > easy workaround for buffering, and you may want to do this in app code
> (for
> > example, in a customized Processor where you can control when to call
> > context.forward() ).
> >
> > Guozhang
> >
> >
> > On Tue, Apr 19, 2016 at 1:40 PM, Jeff Klukas <jklu...@simple.com> wrote:
> >
> > > Is it true that the aggregation and reduction methods of KStream will
> > emit
> > > a new output message for each incoming message?
> > >
> > > I have an application that's copying a Postgres replication stream to a
> > > Kafka topic, and activity tends to be clustered, with many updates to a
> > > given primary key happening in quick succession. I'd like to smooth
> that
> > > out by buffering the messages in tumbling windows, allowing the updates
> > to
> > > overwrite one another, and emitting output messages only at the end of
> > the
> > > window.
> > >
> > > Does the Kafka Streams API provide any hooks that I could use to
> achieve
> > > this kind of windowed "buffering" or "deduplication" of a stream?
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: How to "buffer" a stream with high churn and output only at the end of a window?

Reply via email to