I should have been clearer - I used Roshan's terminology in my reply. Basically, the old producer "batch" Send() just took a sequence of messages. I assumed Roshan is looking for something similar - which allows for mixing messages for multiple partitions and therefore can fail for some messages and succeed for others.
This is unrelated for MessageSet, which is for a specific partition and indeed fails or succeeds as a whole. For completeness - the internal RecordAccumulator component of the new KafkaProducer does manage a separate batch for each partition, and these batches should succeed or fail as a whole. I'm not sure I want to expose this level of implementation detail in our API though. Gwen On Mon, Apr 27, 2015 at 2:36 PM, Magnus Edenhill <mag...@edenhill.se> wrote: > Hi Gwen, > > can you clarify: by batch do you mean the protocol MessageSet, or some java > client internal construct? > If the former I was under the impression that a produced MessageSet either > succeeds delivery or errors in its entirety on the broker. > > Thanks, > Magnus > > > 2015-04-27 23:05 GMT+02:00 Gwen Shapira <gshap...@cloudera.com>: > > > Batch failure is a bit meaningless, since in the same batch, some records > > can succeed and others may fail. > > To implement an error handling logic (usually different than retry, since > > the producer has a configuration controlling retries), we recommend using > > the callback option of Send(). > > > > Gwen > > > > P.S > > Awesome seeing you here, Roshan :) > > > > On Mon, Apr 27, 2015 at 1:53 PM, Roshan Naik <ros...@hortonworks.com> > > wrote: > > > > > The important guarantee that is needed for a client producer thread is > > > that it requires an indication of success/failure of the batch of > events > > > it pushed. Essentially it needs to retry producer.send() on that same > > > batch in case of failure. My understanding is that flush will simply > > flush > > > data from all threads (correct me if I am wrong). > > > > > > -roshan > > > > > > > > > > > > On 4/27/15 1:36 PM, "Joel Koshy" <jjkosh...@gmail.com> wrote: > > > > > > >This sounds like flush: > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-8+-+Add+a+flush+meth > > > >od+to+the+producer+API > > > > > > > >which was recently implemented in trunk. > > > > > > > >Joel > > > > > > > >On Mon, Apr 27, 2015 at 08:19:40PM +0000, Roshan Naik wrote: > > > >> Been evaluating the perf of old and new Produce APIs for reliable > high > > > >>volume streaming data movement. I do see one area of improvement that > > > >>the new API could use for synchronous clients. > > > >> > > > >> AFAIKT, the new API does not support batched synchronous transfers. > To > > > >>do synchronous send, one needs to do a future.get() after every > > > >>Producer.send(). I changed the new > > > >>o.a.k.clients.tools.ProducerPerformance tool to asses the perf of > this > > > >>mode of operation. May not be surprising that it much slower than the > > > >>async mode... hard t push it beyond 4MB/s. > > > >> > > > >> The 0.8.1 Scala based producer API supported a batched sync mode via > > > >>Producer.send( List<KeyedMessage> ) . My measurements show that it > was > > > >>able to approach (and sometimes exceed) the old async speeds... > 266MB/s > > > >> > > > >> > > > >> Supporting this batched sync mode is very critical for streaming > > > >>clients (such as flume for example) that need delivery guarantees. > > > >>Although it can be done with Async mode, it requires additional book > > > >>keeping as to which events are delivered and which ones are not. The > > > >>programming model becomes much simpler with the batched sync mode. > > > >>Client having to deal with one single future.get() helps performance > > > >>greatly too as I noted. > > > >> > > > >> Wanted to propose adding this as an enhancement to the new Producer > > API. > > > > > > > > > > > > >