Re: New Producer Public API

Jay Kreps Fri, 31 Jan 2014 14:46:28 -0800

Hey Tom,

Agreed, there is definitely nothing that prevents our including partitioner
implementations, but it does get a little less seamless.


-Jay


On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown <tombrow...@gmail.com> wrote:

> Regarding partitioning APIs, I don't think there is not a common subset of
> information that is required for all strategies. Instead of modifying the
> core API to easily support all of the various partitioning strategies,
> offer the most common ones as libraries they can build into their own data
> pipeline, just like serialization. The core API would simply accept a
> partition index. You could include one default strategy (random) that only
> applies if they set "-1" for the partition index.
>
> That way, each partitioning strategy could have its own API that makes
> sense for it. For example, a round-robin partitioner only needs one method:
> "nextPartition()", while a hash-based one needs "getPartitionFor(byte[])".
>
> For those who actually need a pluggable strategy, a superset of the API
> could be codified into an interface (perhaps the existing partitioner
> interface), but it would still have to be used from outside of the core
> API.
>
> This design would make the core API less confusing (when do I use a
> partiton key instead of a partition index, does the key overwrite the
> index, can the key be null, etc...?) while still providing the flexibility
> you want.
>
> --Tom
>
> On Fri, Jan 31, 2014 at 12:07 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > Oliver,
> >
> > Yeah that was my original plan--allow the registration of multiple
> > callbacks on the future. But there is some additional implementation
> > complexity because then you need more synchronization variables to ensure
> > the callback gets executed even if the request has completed at the time
> > the callback is registered. This also makes it unpredictable the order of
> > callback execution--I want to be able to guarantee that for a particular
> > partition callbacks for lower offset messages happen before callbacks for
> > higher offset messages so that if you set a highwater mark or something
> it
> > is easy to reason about. This has the added benefit that callbacks
> execute
> > in the I/O thread ALWAYS instead of it being non-deterministic which is a
> > little confusing.
> >
> > I thought a single callback is sufficient since you can always include
> > multiple actions in that callback, and I think that case is rare anyway.
> >
> > I did think about the possibility of adding a thread pool for handling
> the
> > callbacks. But there are a lot of possible configurations for such a
> thread
> > pool and a simplistic approach would no longer guarantee in-order
> > processing of callbacks (you would need to hash execution over threads by
> > partition id). I think by just exposing the simple method that executes
> in
> > the I/O thread you can easily implement the pooled execution using the
> > therad pooling mechanism of your choice by just having the callback use
> an
> > executor to run the action (i.e. make an AsyncCallback that takes a
> > threadpool and a Runnable or something like that). This gives the user
> full
> > control over the executor (there are lots of details around thread re-use
> > in executors, thread factories, etc and trying to expose configs for
> every
> > variation will be a pain). This also makes it totally transparent how it
> > works; that is if we did expose all kinds of thread pool configs you
> would
> > still probably end up reading our code to figure out exactly what they
> all
> > did.
> >
> > -Jay
> >
> >
> > On Fri, Jan 31, 2014 at 9:39 AM, Oliver Dain <od...@3cinteractive.com
> > >wrote:
> >
> > > Hmmm.. I should read the docs more carefully before I open my big
> mouth:
> > I
> > > just noticed the KafkaProducer#send overload that takes a callback.
> That
> > > definitely helps address my concern though I think the API would be
> > > cleaner if there was only one variant that returned a future and you
> > could
> > > register the callback with the future. This is not nearly as important
> as
> > > I'd thought given the ability to register a callback - just a
> preference.
> > >
> > >
> > >
> > >
> > >
> > > On 1/31/14, 9:33 AM, "Oliver Dain" <od...@3cinteractive.com> wrote:
> > >
> > > >Hey all,
> > > >
> > > >I¹m excited about having a new Producer API, and I really like the
> idea
> > of
> > > >removing the distinction between a synchronous and asynchronous
> > producer.
> > > >The one comment I have about the current API is that it¹s hard to
> write
> > > >truly asynchronous code with the type of future returned by the send
> > > >method. The issue is that send returns a RecordSend and there¹s no way
> > to
> > > >register a callback with that object. It is therefore necessary to
> poll
> > > >the object periodically to see if the send has completed. So if you
> > have n
> > > >send calls outstanding you have to check n RecordSend objects which is
> > > >slow. In general this tends to lead to people using one thread per
> send
> > > >call and then calling RecordSend#await which removes much of the
> benefit
> > > >of an async API.
> > > >
> > > >I think it¹s much easier to write truly asynchronous code if the
> > returned
> > > >future allows you to register a callback. That way, instead of polling
> > you
> > > >can simply wait for the callback to be called. A good example of the
> > kind
> > > >of thing I¹m thinking is the ListenableFuture class in the Guava
> > > >libraries:
> > > >
> > > >
> > https://code.google.com/p/guava-libraries/wiki/ListenableFutureExplained
> > > >
> > > >
> > > >HTH,
> > > >Oliver
> > > >
> > >
> > >
> >
>

Re: New Producer Public API

Reply via email to