Re: New Producer API - batched sync mode support

Jay Kreps Tue, 28 Apr 2015 17:31:25 -0700

Hey guys,

The locking argument is correct for very small records (< 50 bytes),
batching will help here because for small records locking becomes the big
bottleneck. I think these use cases are rare but not unreasonable.

Overall I'd emphasize that the new producer is way faster at virtually all
use cases. If there is a use case where that isn't true, let's look at it
in a data driven way by comparing the old producer to the new producer and
looking for any areas where things got worse.

I suspect the "reducing allocations" argument to be not a big thing. We do
a number of small per-message allocations and it didn't seem to have much
impact. I do think there are a couple of big producer memory optimizations
we could do by reusing the arrays in the accumulator in the serialization
of the request but I don't think this is one of them.

I'd be skeptical of any api that was too weird--i.e. introduces a new way
of partitioning, gives back errors on a per-partition rather than per
message basis (given that partitioning is transparent this is really hard
to think about), etc. Bad apis end up causing a ton of churn and just don't
end up being a good long term commitment as we change how the underlying
code works over time (i.e. we hyper optimize for something then have to
maintain some super weird api as it becomes hyper unoptimized for the
client over time).

Roshan--Flush works as you would hope, it blocks on the completion of all
outstanding requests. Calling get on the future for the request gives you
the associated error code back. Flush doesn't throw any exceptions because
waiting for requests to complete doesn't error, the individual requests
fail or succeed which is always reported with each request.

Ivan--The batches you send in the scala producer today actually aren't
truely atomic, they just get sent in a single request.

One tricky problem to solve when user's do batching is size limits on
requests. This can be very hard to manage since predicting the serialized
size of a bunch of java objects is not always obvious. This was repeatedly
a problem before.

-Jay

On Tue, Apr 28, 2015 at 4:51 PM, Ivan Balashov <ibalas...@gmail.com> wrote:

> I must agree with @Roshan – it's hard to imagine anything more intuitive
> and easy to use for atomic batching as old sync batch api. Also, it's fast.
> Coupled with a separate instance of producer per
> broker:port:topic:partition it works very well. I would be glad if it finds
> its way into new producer api.
>
> On a side-side-side note, could anyone confirm/deny if SimpleConsumer's
> fetchSize must be set at least as batch bytes (before or after
> compression), otherwise client risks not getting any messages?
>

Re: New Producer API - batched sync mode support

Reply via email to