Hey all,

I have been working on a rewrite of the producer as described in the wiki
below and discussed in a few previous threads:
https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite

My code is still has some bugs and is a bit rough in parts, but it
functions in the basic cases. I did some basic performance tests over
localhost, and the new approach has paid off quite significantly--for small
(10 byte) messages a single thread on my laptop can send over 1m
messages/second, and with larger messages easily maxes out the server.

The difference between "sync" and "async" largely producer disappears--all
requests immediately return a future response which can be used to get the
behavior of either sync or async usage and we batch whenever the producer
is under load using a "group commit"-like approach. You can encourage
additional batching by incurring a small amount of latency (as before).

Let's talk about how to integrate this code.

This is a from-scratch rewrite of the producer code. As such it is a pretty
major change. So far I have mostly been working on my own. I'd like to
start getting feedback before I get too far along--no point in my polishing
things that are going to be significantly revised in review, after all.

As such here is what I would propose:

1. I'll put up a preliminary patch. Since this code is a completely
standalone module it will not destabilize the existing server or existing
producer (in fact there is no change to those). I will avoid including
build support in this patch until we get the gradle stuff worked out so as
to not break that patch (hopefully that moves along). Let's take this patch
"as is" but with no expectation that the code is complete or that checkin
implies everyone agrees with every design decision. I will follow-up with
subsequent patches as we do reviews and discussions.

2. I'll send out a few higher-level topics for discussion threads. Let's
get to consensus on these. I think micro-reviewing minor correctness issues
won't be productive until we make higher level decisions. The topics. I'd
like to discuss include
a. The producer code:
     - The public API
     - The configurations: their names, and the general knobs we are
     - Client message serialization
     - The instrumentation to have
     - The blocking and batching behavior
b. The common code and few other cross-cutting policy things
     - The approach to protocol definition and request serialization
     - The config definition helper code
     - The metrics package
     - The project layout
     - The java coding style and the use of java
     - The approach to logging

This is somewhat backwards, but I think it will be easier to handle changes
that fall out of these discussions against an existing code base that is
checked in otherwise each revision will be a brand new very large patch.

If no objections I will toss up this code and kick off some of these
discussions.

-Jay

Reply via email to