Re: New Producer Public API

Neha Narkhede Wed, 05 Feb 2014 10:24:52 -0800

Currently, the user will send ProducerRecords using the new producer. The
expectation will be that you get the same thing as output from the
consumer. Since ProduceRecord is a holder for topic, partition, key and
value, does it make sense to rename it to just Record? So, the send/receive
APIs would look like the following -


producer.send(Record record);
List<Record> poll();

Thoughts?


On Sun, Feb 2, 2014 at 4:12 PM, Guozhang Wang <[email protected]> wrote:

> I think the most common motivate of having a customized partitioner is to
> make sure some messages always go to the same partition, but people may
> seldom want to know about which partition exactly they go to. If that is
> true, why not just assign the same byte array as partition key with the
> default hash based partitioning in option 1.A? But again, that is based on
> my presumption that very few users would want to really specify the
> partition id.
>
>
>
> On Fri, Jan 31, 2014 at 2:44 PM, Jay Kreps <[email protected]> wrote:
>
> > Hey Tom,
> >
> > Agreed, there is definitely nothing that prevents our including
> partitioner
> > implementations, but it does get a little less seamless.
> >
> > -Jay
> >
> >
> > On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown <[email protected]> wrote:
> >
> > > Regarding partitioning APIs, I don't think there is not a common subset
> > of
> > > information that is required for all strategies. Instead of modifying
> the
> > > core API to easily support all of the various partitioning strategies,
> > > offer the most common ones as libraries they can build into their own
> > data
> > > pipeline, just like serialization. The core API would simply accept a
> > > partition index. You could include one default strategy (random) that
> > only
> > > applies if they set "-1" for the partition index.
> > >
> > > That way, each partitioning strategy could have its own API that makes
> > > sense for it. For example, a round-robin partitioner only needs one
> > method:
> > > "nextPartition()", while a hash-based one needs
> > "getPartitionFor(byte[])".
> > >
> > > For those who actually need a pluggable strategy, a superset of the API
> > > could be codified into an interface (perhaps the existing partitioner
> > > interface), but it would still have to be used from outside of the core
> > > API.
> > >
> > > This design would make the core API less confusing (when do I use a
> > > partiton key instead of a partition index, does the key overwrite the
> > > index, can the key be null, etc...?) while still providing the
> > flexibility
> > > you want.
> > >
> > > --Tom
> > >
> > > On Fri, Jan 31, 2014 at 12:07 PM, Jay Kreps <[email protected]>
> wrote:
> > >
> > > > Oliver,
> > > >
> > > > Yeah that was my original plan--allow the registration of multiple
> > > > callbacks on the future. But there is some additional implementation
> > > > complexity because then you need more synchronization variables to
> > ensure
> > > > the callback gets executed even if the request has completed at the
> > time
> > > > the callback is registered. This also makes it unpredictable the
> order
> > of
> > > > callback execution--I want to be able to guarantee that for a
> > particular
> > > > partition callbacks for lower offset messages happen before callbacks
> > for
> > > > higher offset messages so that if you set a highwater mark or
> something
> > > it
> > > > is easy to reason about. This has the added benefit that callbacks
> > > execute
> > > > in the I/O thread ALWAYS instead of it being non-deterministic which
> > is a
> > > > little confusing.
> > > >
> > > > I thought a single callback is sufficient since you can always
> include
> > > > multiple actions in that callback, and I think that case is rare
> > anyway.
> > > >
> > > > I did think about the possibility of adding a thread pool for
> handling
> > > the
> > > > callbacks. But there are a lot of possible configurations for such a
> > > thread
> > > > pool and a simplistic approach would no longer guarantee in-order
> > > > processing of callbacks (you would need to hash execution over
> threads
> > by
> > > > partition id). I think by just exposing the simple method that
> executes
> > > in
> > > > the I/O thread you can easily implement the pooled execution using
> the
> > > > therad pooling mechanism of your choice by just having the callback
> use
> > > an
> > > > executor to run the action (i.e. make an AsyncCallback that takes a
> > > > threadpool and a Runnable or something like that). This gives the
> user
> > > full
> > > > control over the executor (there are lots of details around thread
> > re-use
> > > > in executors, thread factories, etc and trying to expose configs for
> > > every
> > > > variation will be a pain). This also makes it totally transparent how
> > it
> > > > works; that is if we did expose all kinds of thread pool configs you
> > > would
> > > > still probably end up reading our code to figure out exactly what
> they
> > > all
> > > > did.
> > > >
> > > > -Jay
> > > >
> > > >
> > > > On Fri, Jan 31, 2014 at 9:39 AM, Oliver Dain <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Hmmm.. I should read the docs more carefully before I open my big
> > > mouth:
> > > > I
> > > > > just noticed the KafkaProducer#send overload that takes a callback.
> > > That
> > > > > definitely helps address my concern though I think the API would be
> > > > > cleaner if there was only one variant that returned a future and
> you
> > > > could
> > > > > register the callback with the future. This is not nearly as
> > important
> > > as
> > > > > I'd thought given the ability to register a callback - just a
> > > preference.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 1/31/14, 9:33 AM, "Oliver Dain" <[email protected]>
> wrote:
> > > > >
> > > > > >Hey all,
> > > > > >
> > > > > >I¹m excited about having a new Producer API, and I really like the
> > > idea
> > > > of
> > > > > >removing the distinction between a synchronous and asynchronous
> > > > producer.
> > > > > >The one comment I have about the current API is that it¹s hard to
> > > write
> > > > > >truly asynchronous code with the type of future returned by the
> send
> > > > > >method. The issue is that send returns a RecordSend and there¹s no
> > way
> > > > to
> > > > > >register a callback with that object. It is therefore necessary to
> > > poll
> > > > > >the object periodically to see if the send has completed. So if
> you
> > > > have n
> > > > > >send calls outstanding you have to check n RecordSend objects
> which
> > is
> > > > > >slow. In general this tends to lead to people using one thread per
> > > send
> > > > > >call and then calling RecordSend#await which removes much of the
> > > benefit
> > > > > >of an async API.
> > > > > >
> > > > > >I think it¹s much easier to write truly asynchronous code if the
> > > > returned
> > > > > >future allows you to register a callback. That way, instead of
> > polling
> > > > you
> > > > > >can simply wait for the callback to be called. A good example of
> the
> > > > kind
> > > > > >of thing I¹m thinking is the ListenableFuture class in the Guava
> > > > > >libraries:
> > > > > >
> > > > > >
> > > >
> > https://code.google.com/p/guava-libraries/wiki/ListenableFutureExplained
> > > > > >
> > > > > >
> > > > > >HTH,
> > > > > >Oliver
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: New Producer Public API

Reply via email to