Here's what I think : # The new producer generates Java futures , we all know the problems with java futures (cannot compose, blocking, does not work well with other JVM languages /libraries - RxJava/RxScala etc)
# or we can pass in a callback - works okay when we are dealing with single messages but with a batch of messages pushes lot of book keeping on the caller. The client now will have to deal with coordinating one callback per producer.send(), or deal with a single stateful callback and handle synchronization across all the state generated by callbacks. Granted, we can simplify the model because we know there is a single i/o thread that runs the callbacks, but then, we are relying on an implementation detail. Does not feel very clean Overall, when I have to send a bunch of messages synchronously, the new producer does not give me a good way to model it. It feels like the new producer is more prescriptive. Now if the producer had just one more API that took a list of messages and handed me back a callback for that list, things would've been much simpler. On Mon, Aug 17, 2015 at 10:41 PM, Kishore Senji <kse...@gmail.com> wrote: > If linger.ms is 0, batching does not add to the latency. It will actually > improve throughput without affecting latency. Enabling batching does not > mean it will wait for the batch to be full. Whatever gets filled during the > previous batch send will be sent in the current batch even if it count is > less than batch.size > > You do not have to work with Future. With callback you will get Async model > essentially (and you can make use of it if you webservice is using Servlet > 3.0) > > > producer.send(record, new AsyncCallback(request, response)); > > > static final class AsyncCallback implements Callback { > > HttpServletRequest request; > HttpServletResponse response; > > void onCompletion(RecordMetadata metadata, java.lang.Exception exception) { > > // Check exception and send appropriate response > > } > } > > On Mon, Aug 17, 2015 at 10:49 AM Neelesh <neele...@gmail.com> wrote: > > > Thanks for the answers. Indeed, the callback model is the same regardless > > of batching. But for a synchronous web service, batching creates a > latency > > issue. linger.ms is by default set to zero. Also, java futures are hard > > to > > work with compared to Scala futures. The current API also returns one > > future per single record send (correct me if I missed another variant) > that > > leaves the client code to deal with hundreds of futures and/or callbacks. > > May I'm missing something very obvious in the new API, but this model and > > the fact that the scala APIs are going away makes writing an ingestion > > service in front of Kafka more involved than the 0.8.1 API. > > > > On Sun, Aug 16, 2015 at 12:02 AM, Kishore Senji <kse...@gmail.com> > wrote: > > > > > Adding to what Gwen already mentioned - > > > > > > The programming model for the Producer is send() with an optional > > callback > > > and we get a Future. This model does not change whether behind the > scenes > > > batching is done or not. So your fault tolerance logic really should > not > > > depend on whether batching is done over the wire for performance > reasons. > > > So assuming that you will get better fault tolerance without batching > is > > > also not accurate, as you have to check you have any exception in the > > > onCompletion() > > > > > > The webservice should have a callback registered (using which you > > > essentially get async model) for every send() and based on that it > should > > > respond to its clients whether the call is successful or not. The > clients > > > of your webservice should have fault tolerance built on top of your > > > response codes. > > > > > > I think batching is a good thing as you get better throughput plus if > you > > > do not have linger.ms set, it does not wait until it completely > reaches > > > the > > > batch.size so all the concurrent requests to your webservice will get > > > batched and sent to the broker which will increase the throughput of > the > > > Producer and in turn your webservice. > > > > > > On Fri, Aug 14, 2015 at 6:10 PM Gwen Shapira <g...@confluent.io> > wrote: > > > > > > > Hi Neelesh :) > > > > > > > > The new producer has configuration for controlling the batch sizes. > > > > By default, it will batch as much as possible without delay > (controlled > > > by > > > > linger.ms) and without using too much memory (controlled by > > batch.size). > > > > > > > > As mentioned in the docs, you can set batch.size to 0 to disable > > batching > > > > completely if you want. > > > > > > > > It is worthwhile to consider using the producer callback to avoid > > losing > > > > messages when the webservice crashes (for example have the webservice > > > only > > > > consider messages as sent if the callback is triggered for a > successful > > > > send). > > > > > > > > You can read more information on batching here: > > > > > > > > > > > > > > http://ingest.tips/2015/07/19/tips-for-improving-performance-of-kafka-producer/ > > > > > > > > And some examples on how to produce data to Kafka with the new > > producer - > > > > both with futures and callbacks here: > > > > > > > > > > > > > > https://github.com/gwenshap/kafka-examples/blob/master/SimpleCounter/src/main/java/com/shapira/examples/producer/simplecounter/DemoProducerNewJava.java > > > > > > > > Gwen > > > > > > > > > > > > > > > > On Fri, Aug 14, 2015 at 5:07 PM, Neelesh <neele...@gmail.com> wrote: > > > > > > > > > We are fronting all our Kafka requests with a simple web service > (we > > do > > > > > some additional massaging and writing to other stores as well). The > > new > > > > > KafkaProducer in 0.8.2 seems very geared towards producer batching. > > > Most > > > > of > > > > > our payload are single messages. > > > > > > > > > > Producer batching basically sets us up for lost messages if our web > > > > service > > > > > goes down with unflushed messaged in the producer. > > > > > > > > > > Another issue is when we have a batch of records. It looks like I > > have > > > to > > > > > call producer.send for each record and deal with individual futures > > > > > returned. > > > > > > > > > > Are there any patterns for primarily single message requests, > without > > > > > losing data? I understand the throughput will be low. > > > > > > > > > > Thanks! > > > > > -Neelesh > > > > > > > > > > > > > > >