Re: New Producer Public API

Guozhang Wang Sun, 02 Feb 2014 16:13:01 -0800

I think the most common motivate of having a customized partitioner is to
make sure some messages always go to the same partition, but people may
seldom want to know about which partition exactly they go to. If that is
true, why not just assign the same byte array as partition key with the
default hash based partitioning in option 1.A? But again, that is based on
my presumption that very few users would want to really specify the
partition id.




On Fri, Jan 31, 2014 at 2:44 PM, Jay Kreps <[email protected]> wrote:

> Hey Tom,
>
> Agreed, there is definitely nothing that prevents our including partitioner
> implementations, but it does get a little less seamless.
>
> -Jay
>
>
> On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown <[email protected]> wrote:
>
> > Regarding partitioning APIs, I don't think there is not a common subset
> of
> > information that is required for all strategies. Instead of modifying the
> > core API to easily support all of the various partitioning strategies,
> > offer the most common ones as libraries they can build into their own
> data
> > pipeline, just like serialization. The core API would simply accept a
> > partition index. You could include one default strategy (random) that
> only
> > applies if they set "-1" for the partition index.
> >
> > That way, each partitioning strategy could have its own API that makes
> > sense for it. For example, a round-robin partitioner only needs one
> method:
> > "nextPartition()", while a hash-based one needs
> "getPartitionFor(byte[])".
> >
> > For those who actually need a pluggable strategy, a superset of the API
> > could be codified into an interface (perhaps the existing partitioner
> > interface), but it would still have to be used from outside of the core
> > API.
> >
> > This design would make the core API less confusing (when do I use a
> > partiton key instead of a partition index, does the key overwrite the
> > index, can the key be null, etc...?) while still providing the
> flexibility
> > you want.
> >
> > --Tom
> >
> > On Fri, Jan 31, 2014 at 12:07 PM, Jay Kreps <[email protected]> wrote:
> >
> > > Oliver,
> > >
> > > Yeah that was my original plan--allow the registration of multiple
> > > callbacks on the future. But there is some additional implementation
> > > complexity because then you need more synchronization variables to
> ensure
> > > the callback gets executed even if the request has completed at the
> time
> > > the callback is registered. This also makes it unpredictable the order
> of
> > > callback execution--I want to be able to guarantee that for a
> particular
> > > partition callbacks for lower offset messages happen before callbacks
> for
> > > higher offset messages so that if you set a highwater mark or something
> > it
> > > is easy to reason about. This has the added benefit that callbacks
> > execute
> > > in the I/O thread ALWAYS instead of it being non-deterministic which
> is a
> > > little confusing.
> > >
> > > I thought a single callback is sufficient since you can always include
> > > multiple actions in that callback, and I think that case is rare
> anyway.
> > >
> > > I did think about the possibility of adding a thread pool for handling
> > the
> > > callbacks. But there are a lot of possible configurations for such a
> > thread
> > > pool and a simplistic approach would no longer guarantee in-order
> > > processing of callbacks (you would need to hash execution over threads
> by
> > > partition id). I think by just exposing the simple method that executes
> > in
> > > the I/O thread you can easily implement the pooled execution using the
> > > therad pooling mechanism of your choice by just having the callback use
> > an
> > > executor to run the action (i.e. make an AsyncCallback that takes a
> > > threadpool and a Runnable or something like that). This gives the user
> > full
> > > control over the executor (there are lots of details around thread
> re-use
> > > in executors, thread factories, etc and trying to expose configs for
> > every
> > > variation will be a pain). This also makes it totally transparent how
> it
> > > works; that is if we did expose all kinds of thread pool configs you
> > would
> > > still probably end up reading our code to figure out exactly what they
> > all
> > > did.
> > >
> > > -Jay
> > >
> > >
> > > On Fri, Jan 31, 2014 at 9:39 AM, Oliver Dain <[email protected]
> > > >wrote:
> > >
> > > > Hmmm.. I should read the docs more carefully before I open my big
> > mouth:
> > > I
> > > > just noticed the KafkaProducer#send overload that takes a callback.
> > That
> > > > definitely helps address my concern though I think the API would be
> > > > cleaner if there was only one variant that returned a future and you
> > > could
> > > > register the callback with the future. This is not nearly as
> important
> > as
> > > > I'd thought given the ability to register a callback - just a
> > preference.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 1/31/14, 9:33 AM, "Oliver Dain" <[email protected]> wrote:
> > > >
> > > > >Hey all,
> > > > >
> > > > >I¹m excited about having a new Producer API, and I really like the
> > idea
> > > of
> > > > >removing the distinction between a synchronous and asynchronous
> > > producer.
> > > > >The one comment I have about the current API is that it¹s hard to
> > write
> > > > >truly asynchronous code with the type of future returned by the send
> > > > >method. The issue is that send returns a RecordSend and there¹s no
> way
> > > to
> > > > >register a callback with that object. It is therefore necessary to
> > poll
> > > > >the object periodically to see if the send has completed. So if you
> > > have n
> > > > >send calls outstanding you have to check n RecordSend objects which
> is
> > > > >slow. In general this tends to lead to people using one thread per
> > send
> > > > >call and then calling RecordSend#await which removes much of the
> > benefit
> > > > >of an async API.
> > > > >
> > > > >I think it¹s much easier to write truly asynchronous code if the
> > > returned
> > > > >future allows you to register a callback. That way, instead of
> polling
> > > you
> > > > >can simply wait for the callback to be called. A good example of the
> > > kind
> > > > >of thing I¹m thinking is the ListenableFuture class in the Guava
> > > > >libraries:
> > > > >
> > > > >
> > >
> https://code.google.com/p/guava-libraries/wiki/ListenableFutureExplained
> > > > >
> > > > >
> > > > >HTH,
> > > > >Oliver
> > > > >
> > > >
> > > >
> > >
> >
>



-- 
-- Guozhang

Re: New Producer Public API

Reply via email to