Re: New Producer Public API

Jay Kreps Wed, 29 Jan 2014 10:24:55 -0800

Yes, we will absolutely retain protocol compatibility with 0.8 though the
java api will change. The prototype code I posted works with 0.8.


-Jay


On Wed, Jan 29, 2014 at 10:19 AM, Steve Morin <steve.mo...@gmail.com> wrote:

> Is the new producer API going to maintain protocol compatibility with old
> version if the API under the hood?
>
> > On Jan 29, 2014, at 10:15, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > The challenge of directly exposing ProduceRequestResult is that the
> offset
> > provided is just the base offset and there is no way to know for a
> > particular message where it was in relation to that base offset because
> the
> > batching is transparent and non-deterministic. So I think we do need some
> > kind of per-message result.
> >
> > I started with Future<RequestResult>, I think for the same reason you
> > prefer it but then when I actually looked at some code samples it wasn't
> > too great--checked exceptions, methods that we can't easily implement,
> etc.
> > I moved away from that for two reasons:
> > 1. When I actually wrote out some code samples of usage they were a
> little
> > ugly for the reasons I described--checked exceptions, methods we can't
> > implement, no helper methods, etc.
> > 2. I originally intended to make the result send work like a
> > ListenableFuture so that you would register the callback on the result
> > rather than as part of the call. I moved away from this primarily because
> > the implementation complexity was a little higher.
> >
> > Whether or not the code prettiness on its own outweighs the familiarity
> of
> > a normal Future I don't know, but that was the evolution of my thinking.
> >
> > -Jay
> >
> >
> >> On Wed, Jan 29, 2014 at 10:06 AM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> >>
> >> Hey Neha,
> >>
> >> Error handling in RecordSend works as in Future you will get the
> exception
> >> if there is one from any of the accessor methods or await().
> >>
> >> The purpose of hasError was that you can write things slightly more
> simply
> >> (which some people expressed preference for):
> >>  if(send.hasError())
> >>    // do something
> >>  long offset = send.offset();
> >>
> >> Instead of the more the slightly longer:
> >> try {
> >>   long offset = send.offset();
> >> } catch (KafkaException e) {
> >>   // do something
> >> }
> >>
> >>
> >> On Wed, Jan 29, 2014 at 10:01 AM, Neha Narkhede <
> neha.narkh...@gmail.com>wrote:
> >>
> >>> Regarding the use of Futures -
> >>>
> >>> Agree that there are some downsides to using Futures but both
> approaches
> >>> have some tradeoffs.
> >>>
> >>> - Standardization and usability
> >>> Future is a widely used and understood Java API and given that the
> >>> functionality that RecordSend hopes to provide is essentially that of
> >>> Future, I think it makes sense to expose a widely understood public API
> >>> for
> >>> our clients. RecordSend, on the other hand, seems to provide some APIs
> >>> that
> >>> are very similar to that of Future, in addition to exposing a bunch of
> >>> APIs
> >>> that belong to ProduceRequestResult. As a user, I would've really
> >>> preferred
> >>> to deal with ProduceRequestResult directly -
> >>> Future<ProduceRequestResult> send(...)
> >>>
> >>> - Error handling
> >>> RecordSend's error handling is quite unintuitive where the user has to
> >>> remember to invoke hasError and error, instead of just throwing the
> >>> exception. Now there are
> >>> some downsides regarding error handling with the Future as well, where
> the
> >>> user has to catch InterruptedException when we would never run into it.
> >>> However, it seems like a price worth paying for supporting a standard
> API
> >>> and error handling
> >>>
> >>> - Unused APIs
> >>> This is a downside of using Future, where the cancel() operation would
> >>> always return false and mean nothing. But we can mention that caveat in
> >>> our
> >>> Java docs.
> >>>
> >>> To summarize, I would prefer to expose a well understood and widely
> >>> adopted
> >>> Java API and put up with the overhead of catching one unnecessary
> checked
> >>> exception, rather than wrap the useful ProduceRequestResult in a custom
> >>> async object (RecordSend) and explain that to our many users.
> >>>
> >>> Thanks,
> >>> Neha
> >>>
> >>>
> >>>
> >>>
> >>>> On Tue, Jan 28, 2014 at 8:10 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> >>>>
> >>>> Hey Neha,
> >>>>
> >>>> Can you elaborate on why you prefer using Java's Future? The downside
> >>> in my
> >>>> mind is the use of the checked InterruptedException and
> >>> ExecutionException.
> >>>> ExecutionException is arguable, but forcing you to catch
> >>>> InterruptedException, often in code that can't be interrupted, seems
> >>>> perverse. It also leaves us with the cancel() method which I don't
> >>> think we
> >>>> really can implement.
> >>>>
> >>>> Option 1A, to recap/elaborate, was the following. There is no
> >>> Serializer or
> >>>> Partitioner api. We take a byte[] key and value and an optional
> integer
> >>>> partition. If you specify the integer partition it will be used. If
> you
> >>> do
> >>>> not specify a key or a partition the partition will be chosen in a
> round
> >>>> robin fashion. If you specify a key but no partition we will chose a
> >>>> partition based on a hash of the key. In order to let the user find
> the
> >>>> partition we will need to given them access to the Cluster instance
> >>>> directly from the producer.
> >>>>
> >>>> -Jay
> >>>>
> >>>>
> >>>> On Tue, Jan 28, 2014 at 6:25 PM, Neha Narkhede <
> neha.narkh...@gmail.com
> >>>>> wrote:
> >>>>
> >>>>> Here are more thoughts on the public APIs -
> >>>>>
> >>>>> - I suggest we use java's Future instead of custom Future especially
> >>>> since
> >>>>> it is part of the public API
> >>>>>
> >>>>> - Serialization: I like the simplicity of the producer APIs with the
> >>>>> absence of serialization where we just deal with byte arrays for keys
> >>> and
> >>>>> values. What I don't like about this is the performance overhead on
> >>> the
> >>>>> Partitioner for any kind of custom partitioning based on the
> >>>> partitionKey.
> >>>>> Since the only purpose of partitionKey is to do custom partitioning,
> >>> why
> >>>>> can't we take it in directly as an integer and let the user figure
> out
> >>>> the
> >>>>> mapping from partition_key -> partition_id using the getCluster()
> API?
> >>>> If I
> >>>>> understand correctly, this is similar to what you suggested as part
> of
> >>>>> option 1A. I like this approach since it maintains the simplicity of
> >>> APIs
> >>>>> by allowing us to deal with bytes and does not compromise performance
> >>> in
> >>>>> the custom partitioning case.
> >>>>>
> >>>>> Thanks,
> >>>>> Neha
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Jan 28, 2014 at 5:42 PM, Jay Kreps <jay.kr...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Hey Tom,
> >>>>>>
> >>>>>> That sounds cool. How did you end up handling parallel I/O if you
> >>> wrap
> >>>>> the
> >>>>>> individual connections? Don't you need some selector that selects
> >>> over
> >>>>> all
> >>>>>> the connections?
> >>>>>>
> >>>>>> -Jay
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jan 28, 2014 at 2:31 PM, Tom Brown <tombrow...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> I implemented a 0.7 client in pure java, and its API very closely
> >>>>>> resembled
> >>>>>>> this. (When multiple people independently engineer the same
> >>> solution,
> >>>>>> it's
> >>>>>>> probably good... right?). However, there were a few architectural
> >>>>>>> differences with my client:
> >>>>>>>
> >>>>>>> 1. The basic client itself was just an asynchronous layer around
> >>> the
> >>>>>>> different server functions. In and of itself it had no knowledge
> >>> of
> >>>>>>> partitions, only servers (and maintained TCP connections to them).
> >>>>>>>
> >>>>>>> 2. The main producer was an additional layer that provided a
> >>>> high-level
> >>>>>>> interface that could batch individual messages based on partition.
> >>>>>>>
> >>>>>>> 3. Knowledge of partitioning was done via an interface so that
> >>>>> different
> >>>>>>> strategies could be used.
> >>>>>>>
> >>>>>>> 4. Partitioning was done by the user, with knowledge of the
> >>> available
> >>>>>>> partitions provided by #3.
> >>>>>>>
> >>>>>>> 5. Serialization was done by the user to simplify the API.
> >>>>>>>
> >>>>>>> 6. Futures were used to make asynchronous emulate synchronous
> >>> calls.
> >>>>>>>
> >>>>>>>
> >>>>>>> The main benefit of this approach is flexibility. For example,
> >>> since
> >>>>> the
> >>>>>>> base client was just a managed connection (and not inherently a
> >>>>>> producer),
> >>>>>>> it was easy to composite a produce request and an offsets request
> >>>>>> together
> >>>>>>> into a confirmed produce request (officially not available in
> >>> 0.7).
> >>>>>>>
> >>>>>>> Decoupling the basic client from partition management allowed the
> >>> me
> >>>> to
> >>>>>>> implement zk discovery as a separate project so that the main
> >>> project
> >>>>> had
> >>>>>>> no complex dependencies. The same was true of decoupling
> >>>> serialization.
> >>>>>>> It's trivial to build an optional layer that adds those features
> >>> in,
> >>>>>> while
> >>>>>>> allowing access to the base APIs for those that need it.
> >>>>>>>
> >>>>>>> Using standard Future objects was also beneficial, since I could
> >>>>> combine
> >>>>>>> them with existing tools (such as guava).
> >>>>>>>
> >>>>>>> It may be too late to be of use, but I have been working with my
> >>>>>> company's
> >>>>>>> legal department to release the implementation I described above.
> >>> If
> >>>>>> you're
> >>>>>>> interested in it, let me know.
> >>>>>>>
> >>>>>>>
> >>>>>>> To sum up my thoughts regarding the new API, I think it's a great
> >>>>> start.
> >>>>>> I
> >>>>>>> would like to see a more layered approach so I can use the parts I
> >>>>> want,
> >>>>>>> and adapt the other parts as needed. I would also like to see
> >>>> standard
> >>>>>>> interfaces (especially Future) used where they makes sense.
> >>>>>>>
> >>>>>>> --Tom
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jan 28, 2014 at 1:33 PM, Roger Hoover <
> >>>> roger.hoo...@gmail.com
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> +1 ListenableFuture: If this works similar to Deferreds in
> >>> Twisted
> >>>>>> Python
> >>>>>>>> or Promised IO in Javascript, I think this is a great pattern
> >>> for
> >>>>>>>> decoupling your callback logic from the place where the Future
> >>> is
> >>>>>>>> generated.  You can register as many callbacks as you like,
> >>> each in
> >>>>> the
> >>>>>>>> appropriate layer of the code and have each observer get
> >>> notified
> >>>>> when
> >>>>>>> the
> >>>>>>>> promised i/o is complete without any of them knowing about each
> >>>>> other.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Jan 28, 2014 at 11:32 AM, Jay Kreps <
> >>> jay.kr...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hey Ross,
> >>>>>>>>>
> >>>>>>>>> - ListenableFuture: Interesting. That would be an alternative
> >>> to
> >>>>> the
> >>>>>>>> direct
> >>>>>>>>> callback support we provide. There could be pros to this, let
> >>> me
> >>>>>> think
> >>>>>>>>> about it.
> >>>>>>>>> - We could provide layering, but I feel that the
> >>> serialization is
> >>>>>> such
> >>>>>>> a
> >>>>>>>>> small thing we should just make a decision and chose one, it
> >>>>> doesn't
> >>>>>>> seem
> >>>>>>>>> to me to justify a whole public facing layer.
> >>>>>>>>> - Yes, this is fairly esoteric, essentially I think it is
> >>> fairly
> >>>>>>> similar
> >>>>>>>> to
> >>>>>>>>> databases like DynamoDB that allow you to specify two
> >>> partition
> >>>>> keys
> >>>>>> (I
> >>>>>>>>> think DynamoDB does this...). The reasoning is that in fact
> >>> there
> >>>>> are
> >>>>>>>>> several things you can use the key field for: (1) to compute
> >>> the
> >>>>>>>> partition
> >>>>>>>>> to store the data in, (2) as a unique identifier to
> >>> deduplicate
> >>>>> that
> >>>>>>>>> partition's records within a log. These two things are almost
> >>>>> always
> >>>>>>> the
> >>>>>>>>> same, but occationally may differ when you want to group data
> >>> in
> >>>> a
> >>>>>> more
> >>>>>>>>> sophisticated way then just a hash of the primary key but
> >>> still
> >>>>>> retain
> >>>>>>>> the
> >>>>>>>>> proper primary key for delivery to the consumer and log
> >>>> compaction.
> >>>>>>>>>
> >>>>>>>>> -Jay
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Jan 28, 2014 at 3:24 AM, Ross Black <
> >>>>> ross.w.bl...@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Jay,
> >>>>>>>>>>
> >>>>>>>>>> - Just to add some more info/confusion about possibly using
> >>>>> Future
> >>>>>>> ...
> >>>>>>>>>>  If Kafka uses a JDK future, it plays nicely with other
> >>>>> frameworks
> >>>>>>> as
> >>>>>>>>>> well.
> >>>>>>>>>>  Google Guava has a ListenableFuture that allows callback
> >>>>> handling
> >>>>>>> to
> >>>>>>>> be
> >>>>>>>>>> added via the returned future, and allows the callbacks to
> >>> be
> >>>>>> passed
> >>>>>>>> off
> >>>>>>>>> to
> >>>>>>>>>> a specified executor.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/util/concurrent/ListenableFuture.html
> >>>>>>>>>>  The JDK future can easily be converted to a listenable
> >>>> future.
> >>>>>>>>>>
> >>>>>>>>>> - On the question of byte[] vs Object, could this be solved
> >>> by
> >>>>>>> layering
> >>>>>>>>> the
> >>>>>>>>>> API?  eg. a raw producer (use byte[] and specify the
> >>> partition
> >>>>>>> number)
> >>>>>>>>> and
> >>>>>>>>>> a normal producer (use generic object and specify a
> >>>> Partitioner)?
> >>>>>>>>>>
> >>>>>>>>>> - I am confused by the keys in ProducerRecord and
> >>> Partitioner.
> >>>>>> What
> >>>>>>> is
> >>>>>>>>> the
> >>>>>>>>>> usage for both a key and a partition key? (I am not yet
> >>> using
> >>>>> 0.8)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Ross
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 28 January 2014 05:00, Xavier Stevens <xav...@gaikai.com
> >>>>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> AutoCloseable would be nice for us as most of our code is
> >>>> using
> >>>>>>> Java
> >>>>>>>> 7
> >>>>>>>>> at
> >>>>>>>>>>> this point.
> >>>>>>>>>>>
> >>>>>>>>>>> I like Dropwizard's configuration mapping to POJOs via
> >>>> Jackson,
> >>>>>> but
> >>>>>>>> if
> >>>>>>>>>> you
> >>>>>>>>>>> wanted to stick with property maps I don't care enough to
> >>>>> object.
> >>>>>>>>>>>
> >>>>>>>>>>> If the producer only dealt with bytes, is there a way we
> >>>> could
> >>>>>>> still
> >>>>>>>>> due
> >>>>>>>>>>> partition plugins without specifying the number
> >>> explicitly? I
> >>>>>> would
> >>>>>>>>>> prefer
> >>>>>>>>>>> to be able to pass in field(s) that would be used by the
> >>>>>>> partitioner.
> >>>>>>>>>>> Obviously if this wasn't possible you could always
> >>>> deserialize
> >>>>>> the
> >>>>>>>>> object
> >>>>>>>>>>> in the partitioner and grab the fields you want, but that
> >>>> seems
> >>>>>>>> really
> >>>>>>>>>>> expensive to do on every message.
> >>>>>>>>>>>
> >>>>>>>>>>> It would also be nice to have a Java API Encoder
> >>> constructor
> >>>>>> taking
> >>>>>>>> in
> >>>>>>>>>>> VerifiableProperties. Scala understands how to handle
> >>> "props:
> >>>>>>>>>>> VerifiableProperties = null", but Java doesn't. So you
> >>> don't
> >>>>> run
> >>>>>>> into
> >>>>>>>>>> this
> >>>>>>>>>>> problem until runtime.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Xavier
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jan 27, 2014 at 9:37 AM, Clark Breyman <
> >>>>>> cl...@breyman.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Jay -
> >>>>>>>>>>>>
> >>>>>>>>>>>> Config - your explanation makes sense. I'm just so
> >>>> accustomed
> >>>>>> to
> >>>>>>>>> having
> >>>>>>>>>>>> Jackson automatically map my configuration objects to
> >>> POJOs
> >>>>>> that
> >>>>>>>> I've
> >>>>>>>>>>>> stopped using property files. They are lingua franca.
> >>> The
> >>>>> only
> >>>>>>>>> thought
> >>>>>>>>>>>> might be to separate the config interface from the
> >>>>>> implementation
> >>>>>>>> to
> >>>>>>>>>>> allow
> >>>>>>>>>>>> for alternatives, but that might undermine your point of
> >>>> "do
> >>>>> it
> >>>>>>>> this
> >>>>>>>>>> way
> >>>>>>>>>>> so
> >>>>>>>>>>>> that everyone can find it where they expect it".
> >>>>>>>>>>>>
> >>>>>>>>>>>> Serialization: Of the options, I like 1A the best,
> >>> though
> >>>>>>> possibly
> >>>>>>>>> with
> >>>>>>>>>>>> either an option to specify a partition key rather than
> >>> ID
> >>>>> or a
> >>>>>>>>> helper
> >>>>>>>>>> to
> >>>>>>>>>>>> translate an arbitrary byte[] or long into a partition
> >>>>> number.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Clark
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sun, Jan 26, 2014 at 9:13 PM, Jay Kreps <
> >>>>>> jay.kr...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the detailed thoughts. Let me elaborate on
> >>> the
> >>>>>>> config
> >>>>>>>>>> thing.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I agree that at first glance key-value strings don't
> >>> seem
> >>>>>> like
> >>>>>>> a
> >>>>>>>>> very
> >>>>>>>>>>>> good
> >>>>>>>>>>>>> configuration api for a client. Surely a well-typed
> >>>> config
> >>>>>>> class
> >>>>>>>>>> would
> >>>>>>>>>>> be
> >>>>>>>>>>>>> better! I actually disagree and let me see if I can
> >>>>> convince
> >>>>>>> you.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> My reasoning has nothing to do with the api and
> >>>> everything
> >>>>> to
> >>>>>>> do
> >>>>>>>>> with
> >>>>>>>>>>>>> operations.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Clients are embedded in applications which are
> >>> themselves
> >>>>>>>>> configured.
> >>>>>>>>>>> In
> >>>>>>>>>>>>> any place that takes operations seriously the
> >>>> configuration
> >>>>>> for
> >>>>>>>>> these
> >>>>>>>>>>>>> applications will be version controlled and maintained
> >>>>>> through
> >>>>>>>> some
> >>>>>>>>>>> kind
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> config management system. If we give a config class
> >>> with
> >>>>>>> getters
> >>>>>>>>> and
> >>>>>>>>>>>>> setters the application has to expose those
> >>> properties to
> >>>>> its
> >>>>>>>>>>>>> configuration. What invariably happens is that the
> >>>>>> application
> >>>>>>>>>> exposes
> >>>>>>>>>>>> only
> >>>>>>>>>>>>> a choice few properties that they thought they would
> >>>>> change.
> >>>>>>>>>>> Furthermore
> >>>>>>>>>>>>> the application will make up a name for these configs
> >>>> that
> >>>>>>> seems
> >>>>>>>>>>>> intuitive
> >>>>>>>>>>>>> at the time in the 2 seconds the engineer spends
> >>> thinking
> >>>>>> about
> >>>>>>>> it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Now consider the result of this in the large. You end
> >>> up
> >>>>> with
> >>>>>>>>> dozens
> >>>>>>>>>> or
> >>>>>>>>>>>>> hundreds of applications that have the client
> >>> embedded.
> >>>>> Each
> >>>>>>>>> exposes
> >>>>>>>>>> a
> >>>>>>>>>>>>> different, inadequate subset of the possible configs,
> >>>> each
> >>>>>> with
> >>>>>>>>>>> different
> >>>>>>>>>>>>> names. It is a nightmare.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If you use a string-string map the config system can
> >>>>> directly
> >>>>>>>> get a
> >>>>>>>>>>>> bundle
> >>>>>>>>>>>>> of config key-value pairs and put them into the
> >>> client.
> >>>>> This
> >>>>>>>> means
> >>>>>>>>>> that
> >>>>>>>>>>>> all
> >>>>>>>>>>>>> configuration is automatically available with the name
> >>>>>>> documented
> >>>>>>>>> on
> >>>>>>>>>>> the
> >>>>>>>>>>>>> website in every application that does this. If you
> >>>> upgrade
> >>>>>> to
> >>>>>>> a
> >>>>>>>>> new
> >>>>>>>>>>>> kafka
> >>>>>>>>>>>>> version with more configs those will be exposed too.
> >>> If
> >>>> you
> >>>>>>>> realize
> >>>>>>>>>>> that
> >>>>>>>>>>>>> you need to change a default you can just go through
> >>> your
> >>>>>>> configs
> >>>>>>>>> and
> >>>>>>>>>>>>> change it everywhere as it will have the same name
> >>>>>> everywhere.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jay
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sun, Jan 26, 2014 at 4:47 PM, Clark Breyman <
> >>>>>>>> cl...@breyman.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks Jay. I'll see if I can put together a more
> >>>>> complete
> >>>>>>>>>> response,
> >>>>>>>>>>>>>> perhaps as separate threads so that topics don't get
> >>>>>>> entangled.
> >>>>>>>>> In
> >>>>>>>>>>> the
> >>>>>>>>>>>>> mean
> >>>>>>>>>>>>>> time, here's a couple responses:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Serialization: you've broken out a sub-thread so
> >>> i'll
> >>>>> reply
> >>>>>>>>> there.
> >>>>>>>>>> My
> >>>>>>>>>>>>> bias
> >>>>>>>>>>>>>> is that I like generics (except for type-erasure)
> >>> and
> >>>> in
> >>>>>>>>> particular
> >>>>>>>>>>>> they
> >>>>>>>>>>>>>> make it easy to compose serializers for compound
> >>>> payloads
> >>>>>>> (e.g.
> >>>>>>>>>> when
> >>>>>>>>>>> a
> >>>>>>>>>>>>>> common header wraps a payload of parameterized
> >>> type).
> >>>>> I'll
> >>>>>>>>> respond
> >>>>>>>>>> to
> >>>>>>>>>>>>> your
> >>>>>>>>>>>>>> 4-options message with an example.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Build: I've seen a lot of "maven-compatible" build
> >>>>> systems
> >>>>>>>>> produce
> >>>>>>>>>>>>>> "artifacts" that aren't really artifacts - no
> >>> embedded
> >>>>> POM
> >>>>>>> or,
> >>>>>>>>>> worst,
> >>>>>>>>>>>>>> malformed POM. I know the sbt-generated artifacts
> >>> were
> >>>>> this
> >>>>>>>> way -
> >>>>>>>>>>> onus
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>> on me to see what gradle is spitting out and what a
> >>>> maven
> >>>>>>> build
> >>>>>>>>>> might
> >>>>>>>>>>>>> look
> >>>>>>>>>>>>>> like. Maven may be old and boring, but it gets out
> >>> of
> >>>> the
> >>>>>> way
> >>>>>>>> and
> >>>>>>>>>>>>>> integrates really seamlessly with a lot of IDEs.
> >>> When
> >>>>> some
> >>>>>>>> scala
> >>>>>>>>>>>>> projects I
> >>>>>>>>>>>>>> was working on in the fall of 2011 switched from
> >>> sbt to
> >>>>>>> maven,
> >>>>>>>>>> build
> >>>>>>>>>>>>> became
> >>>>>>>>>>>>>> a non-issue.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Config: Not a big deal  and no, I don't think a
> >>>>> dropwizard
> >>>>>>>>>> dependency
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>> appropriate. I do like using simple entity beans
> >>>> (POJO's
> >>>>>> not
> >>>>>>>>> j2EE)
> >>>>>>>>>>> for
> >>>>>>>>>>>>>> configuration, especially if they can be marshalled
> >>>>> without
> >>>>>>>>>>> annotation
> >>>>>>>>>>>> by
> >>>>>>>>>>>>>> Jackson. I only mentioned the dropwizard-extras
> >>>> because
> >>>>> it
> >>>>>>> has
> >>>>>>>>>> some
> >>>>>>>>>>>>> entity
> >>>>>>>>>>>>>> bean versions of the ZK and Kafka configs.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Domain-packaging: Also not a big deal - it's what's
> >>>>>> expected
> >>>>>>>> and
> >>>>>>>>>> it's
> >>>>>>>>>>>>>> pretty free in most IDE's. The advantages I see is
> >>> that
> >>>>> it
> >>>>>> is
> >>>>>>>>> clear
> >>>>>>>>>>>>> whether
> >>>>>>>>>>>>>> something is from the Apache Kafka project and
> >>> whether
> >>>>>>>> something
> >>>>>>>>> is
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>> another org and related to Kafka. That said, nothing
> >>>>> really
> >>>>>>>>>> enforces
> >>>>>>>>>>>> it.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Futures: I'll see if I can create some examples to
> >>>>>>> demonstrate
> >>>>>>>>>> Future
> >>>>>>>>>>>>>> making interop easier.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> C
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jan 24, 2014 at 4:36 PM, Jay Kreps <
> >>>>>>>> jay.kr...@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hey Clark,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Serialization: Yes I agree with these though I
> >>>> don't
> >>>>>>>> consider
> >>>>>>>>>> the
> >>>>>>>>>>>>> loss
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> generics a big issue. I'll try to summarize what I
> >>>>> would
> >>>>>>>>> consider
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> best
> >>>>>>>>>>>>>>> alternative api with raw byte[].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Maven: We had this debate a few months back and
> >>> the
> >>>>>>>> consensus
> >>>>>>>>>> was
> >>>>>>>>>>>>>> gradle.
> >>>>>>>>>>>>>>> Is there a specific issue with the poms gradle
> >>>> makes? I
> >>>>>> am
> >>>>>>>>>>> extremely
> >>>>>>>>>>>>>> loath
> >>>>>>>>>>>>>>> to revisit the issue as build issues are a
> >>> recurring
> >>>>>> thing
> >>>>>>>> and
> >>>>>>>>> no
> >>>>>>>>>>> one
> >>>>>>>>>>>>>> ever
> >>>>>>>>>>>>>>> agrees and ultimately our build needs are very
> >>>> simple.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Config: I'm not sure if I follow the point. Are
> >>> you
> >>>>>>> saying
> >>>>>>>> we
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>> something in dropwizard for config? One principle
> >>>> here
> >>>>> is
> >>>>>>> to
> >>>>>>>>> try
> >>>>>>>>>> to
> >>>>>>>>>>>>>> remove
> >>>>>>>>>>>>>>> as many client dependencies as possible as we
> >>>>> inevitably
> >>>>>>> run
> >>>>>>>>> into
> >>>>>>>>>>>>>> terrible
> >>>>>>>>>>>>>>> compatibility issues with users who use the same
> >>>>> library
> >>>>>> or
> >>>>>>>> its
> >>>>>>>>>>>>>>> dependencies at different versions. Or are you
> >>>> talking
> >>>>>>> about
> >>>>>>>>>>>>> maintaining
> >>>>>>>>>>>>>>> compatibility with existing config parameters? I
> >>>> think
> >>>>> as
> >>>>>>>> much
> >>>>>>>>>> as a
> >>>>>>>>>>>>>> config
> >>>>>>>>>>>>>>> in the existing client makes sense it should have
> >>> the
> >>>>>> same
> >>>>>>>> name
> >>>>>>>>>> (I
> >>>>>>>>>>>> was
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>> bit sloppy about that so I'll fix any errors
> >>> there).
> >>>>>> There
> >>>>>>>> are
> >>>>>>>>> a
> >>>>>>>>>>> few
> >>>>>>>>>>>>> new
> >>>>>>>>>>>>>>> things and we should give those reasonable
> >>> defaults.
> >>>> I
> >>>>>>> think
> >>>>>>>>>> config
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>> important so I'll start a thread on the config
> >>>> package
> >>>>> in
> >>>>>>>>> there.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - org.apache.kafka: We could do this. I always
> >>>>> considered
> >>>>>>> it
> >>>>>>>>> kind
> >>>>>>>>>>> of
> >>>>>>>>>>>> an
> >>>>>>>>>>>>>> odd
> >>>>>>>>>>>>>>> thing Java programmers do that has no real
> >>> motivation
> >>>>>> (but
> >>>>>>> I
> >>>>>>>>>> could
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>> re-educated!). I don't think it ends up reducing
> >>>> naming
> >>>>>>>>> conflicts
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>> practice and it adds a lot of noise and nested
> >>>>>> directories.
> >>>>>>>> Is
> >>>>>>>>>>> there
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>> reason you prefer this or just to be more
> >>> standard?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Future: Basically I didn't see any particular
> >>>>>> advantage.
> >>>>>>>> The
> >>>>>>>>>>>> cancel()
> >>>>>>>>>>>>>>> method doesn't really make sense so probably
> >>> wouldn't
> >>>>>> work.
> >>>>>>>>>>> Likewise
> >>>>>>>>>>>> I
> >>>>>>>>>>>>>>> dislike the checked exceptions it requires.
> >>>> Basically I
> >>>>>>> just
> >>>>>>>>>> wrote
> >>>>>>>>>>>> out
> >>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>> code examples and it seemed cleaner with a special
> >>>>>> purpose
> >>>>>>>>>> object.
> >>>>>>>>>>> I
> >>>>>>>>>>>>>> wasn't
> >>>>>>>>>>>>>>> actually aware of plans for improved futures in
> >>> java
> >>>> 8
> >>>>> or
> >>>>>>> the
> >>>>>>>>>> other
> >>>>>>>>>>>>>>> integrations. Maybe you could elaborate on this a
> >>> bit
> >>>>> and
> >>>>>>>> show
> >>>>>>>>>> how
> >>>>>>>>>>> it
> >>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>> be used? Sounds promising, I just don't know a lot
> >>>>> about
> >>>>>>> it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Jay
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Jan 24, 2014 at 3:30 PM, Clark Breyman <
> >>>>>>>>>> cl...@breyman.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Jay - Thanks for the call for comments. Here's
> >>> some
> >>>>>>> initial
> >>>>>>>>>>> input:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Make message serialization a client
> >>>> responsibility
> >>>>>>>> (making
> >>>>>>>>>> all
> >>>>>>>>>>>>>> messages
> >>>>>>>>>>>>>>>> byte[]). Reflection-based loading makes it
> >>> harder
> >>>> to
> >>>>>> use
> >>>>>>>>>> generic
> >>>>>>>>>>>>> codecs
> >>>>>>>>>>>>>>>> (e.g.  Envelope<PREFIX, DATA, SUFFIX>) or build
> >>> up
> >>>>>> codec
> >>>>>>>>>>>>>>> programmatically.
> >>>>>>>>>>>>>>>> Non-default partitioning should require an
> >>> explicit
> >>>>>>>> partition
> >>>>>>>>>>> key.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - I really like the fact that it will be native
> >>>> Java.
> >>>>>>>> Please
> >>>>>>>>>>>> consider
> >>>>>>>>>>>>>>> using
> >>>>>>>>>>>>>>>> native maven and not sbt, gradle, ivy, etc as
> >>> they
> >>>>>> don't
> >>>>>>>>>> reliably
> >>>>>>>>>>>>> play
> >>>>>>>>>>>>>>> nice
> >>>>>>>>>>>>>>>> in the maven ecosystem. A jar without a
> >>> well-formed
> >>>>> pom
> >>>>>>>>> doesn't
> >>>>>>>>>>>> feel
> >>>>>>>>>>>>>>> like a
> >>>>>>>>>>>>>>>> real artifact. The pom's generated by sbt et al.
> >>>> are
> >>>>>> not
> >>>>>>>> well
> >>>>>>>>>>>> formed.
> >>>>>>>>>>>>>>> Using
> >>>>>>>>>>>>>>>> maven will make builds and IDE integration much
> >>>>>> smoother.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Look at Nick Telford's dropwizard-extras
> >>> package
> >>>> in
> >>>>>>> which
> >>>>>>>>> he
> >>>>>>>>>>>>> defines
> >>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>> Jackson-compatible POJO's for loading
> >>>> configuration.
> >>>>>>> Seems
> >>>>>>>>> like
> >>>>>>>>>>>> your
> >>>>>>>>>>>>>>> client
> >>>>>>>>>>>>>>>> migration is similar. The config objects should
> >>>> have
> >>>>>>>>>> constructors
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>>>> factories that accept Map<String, String> and
> >>>>>> Properties
> >>>>>>>> for
> >>>>>>>>>> ease
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>>> migration.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Would you consider using the org.apache.kafka
> >>>>> package
> >>>>>>> for
> >>>>>>>>> the
> >>>>>>>>>>> new
> >>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>> (quibble)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Why create your own futures rather than use
> >>>>>>>>>>>>>>>> java.util.concurrent.Future<Long> or similar?
> >>>>> Standard
> >>>>>>>>> futures
> >>>>>>>>>>> will
> >>>>>>>>>>>>>> play
> >>>>>>>>>>>>>>>> nice with other reactive libs and things like
> >>> J8's
> >>>>>>>>>>>> ComposableFuture.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks again,
> >>>>>>>>>>>>>>>> C
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Jan 24, 2014 at 2:46 PM, Roger Hoover <
> >>>>>>>>>>>>> roger.hoo...@gmail.com
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> A couple comments:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 1) Why does the config use a broker list
> >>> instead
> >>>> of
> >>>>>>>>>> discovering
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> brokers
> >>>>>>>>>>>>>>>>> in ZooKeeper?  It doesn't match the
> >>>>> HighLevelConsumer
> >>>>>>>> API.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2) It looks like broker connections are
> >>> created
> >>>> on
> >>>>>>>> demand.
> >>>>>>>>>> I'm
> >>>>>>>>>>>>>>> wondering
> >>>>>>>>>>>>>>>>> if sometimes you might want to flush out
> >>> config
> >>>> or
> >>>>>>>> network
> >>>>>>>>>>>>>> connectivity
> >>>>>>>>>>>>>>>>> issues before pushing the first message
> >>> through.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Should there also be a
> >>> KafkaProducer.connect() or
> >>>>>>> .open()
> >>>>>>>>>>> method
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>> connectAll()?  I guess it would try to
> >>> connect to
> >>>>> all
> >>>>>>>>> brokers
> >>>>>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> BROKER_LIST_CONFIG
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> HTH,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Roger
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Jan 24, 2014 at 11:54 AM, Jay Kreps <
> >>>>>>>>>>> jay.kr...@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> As mentioned in a previous email we are
> >>> working
> >>>>> on
> >>>>>> a
> >>>>>>>>>>>>>>> re-implementation
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> the producer. I would like to use this email
> >>>>> thread
> >>>>>>> to
> >>>>>>>>>>> discuss
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> details
> >>>>>>>>>>>>>>>>>> of the public API and the configuration. I
> >>>> would
> >>>>>> love
> >>>>>>>> for
> >>>>>>>>>> us
> >>>>>>>>>>> to
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>> incredibly picky about this public api now
> >>> so
> >>>> it
> >>>>> is
> >>>>>>> as
> >>>>>>>>> good
> >>>>>>>>>>> as
> >>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> we don't need to break it in the future.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The best way to get a feel for the API is
> >>>>> actually
> >>>>>> to
> >>>>>>>>> take
> >>>>>>>>>> a
> >>>>>>>>>>>> look
> >>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> javadoc, my hope is to get the api docs good
> >>>>> enough
> >>>>>>> so
> >>>>>>>>> that
> >>>>>>>>>>> it
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> self-explanatory:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> http://empathybox.com/kafka-javadoc/index.html?kafka/clients/producer/KafkaProducer.html
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Please take a look at this API and give me
> >>> any
> >>>>>>> thoughts
> >>>>>>>>> you
> >>>>>>>>>>> may
> >>>>>>>>>>>>>> have!
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> It may also be reasonable to take a look at
> >>> the
> >>>>>>>> configs:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> http://empathybox.com/kafka-javadoc/kafka/clients/producer/ProducerConfig.html
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The actual code is posted here:
> >>>>>>>>>>>>>>>>>>
> >>>> https://issues.apache.org/jira/browse/KAFKA-1227
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> A few questions or comments to kick things
> >>> off:
> >>>>>>>>>>>>>>>>>> 1. We need to make a decision on whether
> >>>>>>> serialization
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>>> user's
> >>>>>>>>>>>>>>>> key
> >>>>>>>>>>>>>>>>>> and value should be done by the user (with
> >>> our
> >>>>> api
> >>>>>>> just
> >>>>>>>>>>> taking
> >>>>>>>>>>>>>>> byte[])
> >>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>> if we should take an object and allow the
> >>> user
> >>>> to
> >>>>>>>>>> configure a
> >>>>>>>>>>>>>>>> Serializer
> >>>>>>>>>>>>>>>>>> class which we instantiate via reflection.
> >>> We
> >>>>> take
> >>>>>>> the
> >>>>>>>>>> later
> >>>>>>>>>>>>>> approach
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>> the current producer, and I have carried
> >>> this
> >>>>>> through
> >>>>>>>> to
> >>>>>>>>>> this
> >>>>>>>>>>>>>>>> prototype.
> >>>>>>>>>>>>>>>>>> The tradeoff I see is this: taking byte[] is
> >>>>>> actually
> >>>>>>>>>>> simpler,
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>> directly do whatever serialization they
> >>> like.
> >>>> The
> >>>>>>>>>>> complication
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>> partitioning. Currently partitioning is done
> >>>> by a
> >>>>>>>> similar
> >>>>>>>>>>>> plug-in
> >>>>>>>>>>>>>> api
> >>>>>>>>>>>>>>>>>> (Partitioner) which the user can implement
> >>> and
> >>>>>>>> configure
> >>>>>>>>> to
> >>>>>>>>>>>>>> override
> >>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>> partitions are assigned. If we take byte[]
> >>> as
> >>>>> input
> >>>>>>>> then
> >>>>>>>>> we
> >>>>>>>>>>>> have
> >>>>>>>>>>>>> no
> >>>>>>>>>>>>>>>>> access
> >>>>>>>>>>>>>>>>>> to the original object and partitioning
> >>> MUST be
> >>>>>> done
> >>>>>>> on
> >>>>>>>>> the
> >>>>>>>>>>>>> byte[].
> >>>>>>>>>>>>>>>> This
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> fine for hash partitioning. However for
> >>> various
> >>>>>> types
> >>>>>>>> of
> >>>>>>>>>>>> semantic
> >>>>>>>>>>>>>>>>>> partitioning (range partitioning, or
> >>> whatever)
> >>>>> you
> >>>>>>>> would
> >>>>>>>>>> want
> >>>>>>>>>>>>>> access
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> original object. In the current approach a
> >>>>> producer
> >>>>>>> who
> >>>>>>>>>>> wishes
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> send
> >>>>>>>>>>>>>>>>>> byte[] they have serialized in their own
> >>> code
> >>>> can
> >>>>>>>>> configure
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> BytesSerialization we supply which is just a
> >>>> "no
> >>>>>> op"
> >>>>>>>>>>>>> serialization.
> >>>>>>>>>>>>>>>>>> 2. We should obsess over naming and make
> >>> sure
> >>>>> each
> >>>>>> of
> >>>>>>>> the
> >>>>>>>>>>> class
> >>>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>> good.
> >>>>>>>>>>>>>>>>>> 3. Jun has already pointed out that we need
> >>> to
> >>>>>>> include
> >>>>>>>>> the
> >>>>>>>>>>>> topic
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> partition in the response, which is
> >>> absolutely
> >>>>>>> right. I
> >>>>>>>>>>> haven't
> >>>>>>>>>>>>>> done
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> yet but that definitely needs to be there.
> >>>>>>>>>>>>>>>>>> 4. Currently RecordSend.await will throw an
> >>>>>> exception
> >>>>>>>> if
> >>>>>>>>>> the
> >>>>>>>>>>>>>> request
> >>>>>>>>>>>>>>>>>> failed. The intention here is that
> >>>>>>>>>>>> producer.send(message).await()
> >>>>>>>>>>>>>>>> exactly
> >>>>>>>>>>>>>>>>>> simulates a synchronous call. Guozhang has
> >>>> noted
> >>>>>> that
> >>>>>>>>> this
> >>>>>>>>>>> is a
> >>>>>>>>>>>>>>> little
> >>>>>>>>>>>>>>>>>> annoying since the user must then catch
> >>>>> exceptions.
> >>>>>>>>> However
> >>>>>>>>>>> if
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> remove
> >>>>>>>>>>>>>>>>>> this then if the user doesn't check for
> >>> errors
> >>>>> they
> >>>>>>>> won't
> >>>>>>>>>>> know
> >>>>>>>>>>>>> one
> >>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>> occurred, which I predict will be a common
> >>>>> mistake.
> >>>>>>>>>>>>>>>>>> 5. Perhaps there is more we could do to make
> >>>> the
> >>>>>>> async
> >>>>>>>>>>>> callbacks
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> future
> >>>>>>>>>>>>>>>>>> we give back intuitive and easy to program
> >>>>> against?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Some background info on implementation:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> At a high level the primary difference in
> >>> this
> >>>>>>> producer
> >>>>>>>>> is
> >>>>>>>>>>> that
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> removes
> >>>>>>>>>>>>>>>>>> the distinction between the "sync" and
> >>> "async"
> >>>>>>>> producer.
> >>>>>>>>>>>>>> Effectively
> >>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>> requests are sent asynchronously but always
> >>>>> return
> >>>>>> a
> >>>>>>>>> future
> >>>>>>>>>>>>>> response
> >>>>>>>>>>>>>>>>> object
> >>>>>>>>>>>>>>>>>> that gives the offset as well as any error
> >>> that
> >>>>> may
> >>>>>>>> have
> >>>>>>>>>>>> occurred
> >>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> request is complete. The batching that is
> >>> done
> >>>> in
> >>>>>> the
> >>>>>>>>> async
> >>>>>>>>>>>>>> producer
> >>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>> today is done whenever possible now. This
> >>> means
> >>>>>> that
> >>>>>>>> the
> >>>>>>>>>> sync
> >>>>>>>>>>>>>>> producer,
> >>>>>>>>>>>>>>>>>> under load, can get performance as good as
> >>> the
> >>>>>> async
> >>>>>>>>>> producer
> >>>>>>>>>>>>>>>>> (preliminary
> >>>>>>>>>>>>>>>>>> results show the producer getting 1m
> >>>>> messages/sec).
> >>>>>>>> This
> >>>>>>>>>>> works
> >>>>>>>>>>>>>>> similar
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> group commit in databases but with respect
> >>> to
> >>>> the
> >>>>>>>> actual
> >>>>>>>>>>>> network
> >>>>>>>>>>>>>>>>>> transmission--any messages that arrive
> >>> while a
> >>>>> send
> >>>>>>> is
> >>>>>>>> in
> >>>>>>>>>>>>> progress
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>> batched together. It is also possible to
> >>>>> encourage
> >>>>>>>>> batching
> >>>>>>>>>>>> even
> >>>>>>>>>>>>>>> under
> >>>>>>>>>>>>>>>>> low
> >>>>>>>>>>>>>>>>>> load to save server resources by
> >>> introducing a
> >>>>>> delay
> >>>>>>> on
> >>>>>>>>> the
> >>>>>>>>>>>> send
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>>> more messages to accumulate; this is done
> >>> using
> >>>>> the
> >>>>>>>>>>>>> linger.msconfig
> >>>>>>>>>>>>>>>>> (this
> >>>>>>>>>>>>>>>>>> is similar to Nagle's algorithm in TCP).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> This producer does all network communication
> >>>>>>>>> asynchronously
> >>>>>>>>>>> and
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> parallel
> >>>>>>>>>>>>>>>>>> to all servers so the performance penalty
> >>> for
> >>>>>> acks=-1
> >>>>>>>> and
> >>>>>>>>>>>> waiting
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> replication should be much reduced. I
> >>> haven't
> >>>>> done
> >>>>>>> much
> >>>>>>>>>>>>>> benchmarking
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> this yet, though.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The high level design is described a little
> >>>> here,
> >>>>>>>> though
> >>>>>>>>>> this
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>> now
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>> little out of date:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -Jay
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>

Re: New Producer Public API

Reply via email to