Re: New Producer API - batched sync mode support

Gwen Shapira Mon, 27 Apr 2015 15:00:02 -0700

I should have been clearer - I used Roshan's terminology in my reply.

Basically, the old producer "batch" Send() just took a sequence of
messages. I assumed Roshan is looking for something similar - which allows
for mixing messages for multiple partitions and therefore can fail for some
messages and succeed for others.


This is unrelated for MessageSet, which is for a specific partition and
indeed fails or succeeds as a whole.

For completeness - the internal RecordAccumulator component of the new
KafkaProducer does manage a separate batch for each partition, and these
batches should succeed or fail as a whole. I'm not sure I want to expose
this level of implementation detail in our API though.

Gwen


On Mon, Apr 27, 2015 at 2:36 PM, Magnus Edenhill <mag...@edenhill.se> wrote:

> Hi Gwen,
>
> can you clarify: by batch do you mean the protocol MessageSet, or some java
> client internal construct?
> If the former I was under the impression that a produced MessageSet either
> succeeds delivery or errors in its entirety on the broker.
>
> Thanks,
> Magnus
>
>
> 2015-04-27 23:05 GMT+02:00 Gwen Shapira <gshap...@cloudera.com>:
>
> > Batch failure is a bit meaningless, since in the same batch, some records
> > can succeed and others may fail.
> > To implement an error handling logic (usually different than retry, since
> > the producer has a configuration controlling retries), we recommend using
> > the callback option of Send().
> >
> > Gwen
> >
> > P.S
> > Awesome seeing you here, Roshan :)
> >
> > On Mon, Apr 27, 2015 at 1:53 PM, Roshan Naik <ros...@hortonworks.com>
> > wrote:
> >
> > > The important guarantee that is needed for a client producer thread is
> > > that it requires an indication of success/failure of the batch of
> events
> > > it pushed. Essentially it needs to retry producer.send() on that same
> > > batch in case of failure. My understanding is that flush will simply
> > flush
> > > data from all threads (correct me if I am wrong).
> > >
> > > -roshan
> > >
> > >
> > >
> > > On 4/27/15 1:36 PM, "Joel Koshy" <jjkosh...@gmail.com> wrote:
> > >
> > > >This sounds like flush:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+meth
> > > >od+to+the+producer+API
> > > >
> > > >which was recently implemented in trunk.
> > > >
> > > >Joel
> > > >
> > > >On Mon, Apr 27, 2015 at 08:19:40PM +0000, Roshan Naik wrote:
> > > >> Been evaluating the perf of old and new Produce APIs for reliable
> high
> > > >>volume streaming data movement. I do see one area of improvement that
> > > >>the new API could use for synchronous clients.
> > > >>
> > > >> AFAIKT, the new API does not support batched synchronous transfers.
> To
> > > >>do synchronous send, one needs to do a future.get() after every
> > > >>Producer.send(). I changed the new
> > > >>o.a.k.clients.tools.ProducerPerformance tool to asses the perf of
> this
> > > >>mode of operation. May not be surprising that it much slower than the
> > > >>async mode... hard t push it beyond 4MB/s.
> > > >>
> > > >> The 0.8.1 Scala based producer API supported a batched sync mode via
> > > >>Producer.send( List<KeyedMessage> ) . My measurements show that it
> was
> > > >>able to approach (and sometimes exceed) the old async speeds...
> 266MB/s
> > > >>
> > > >>
> > > >> Supporting this batched sync mode is very critical for streaming
> > > >>clients (such as flume for example) that need delivery guarantees.
> > > >>Although it can be done with Async mode, it requires additional book
> > > >>keeping as to which events are delivered and which ones are not. The
> > > >>programming model becomes much simpler with the batched sync mode.
> > > >>Client having to deal with one single future.get() helps performance
> > > >>greatly too as I noted.
> > > >>
> > > >> Wanted to propose adding this as an enhancement to the new Producer
> > API.
> > > >
> > >
> > >
> >
>

Re: New Producer API - batched sync mode support

Reply via email to