Hey all, I have been working on a rewrite of the producer as described in the wiki below and discussed in a few previous threads: https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite
My code is still has some bugs and is a bit rough in parts, but it functions in the basic cases. I did some basic performance tests over localhost, and the new approach has paid off quite significantly--for small (10 byte) messages a single thread on my laptop can send over 1m messages/second, and with larger messages easily maxes out the server. The difference between "sync" and "async" largely producer disappears--all requests immediately return a future response which can be used to get the behavior of either sync or async usage and we batch whenever the producer is under load using a "group commit"-like approach. You can encourage additional batching by incurring a small amount of latency (as before). Let's talk about how to integrate this code. This is a from-scratch rewrite of the producer code. As such it is a pretty major change. So far I have mostly been working on my own. I'd like to start getting feedback before I get too far along--no point in my polishing things that are going to be significantly revised in review, after all. As such here is what I would propose: 1. I'll put up a preliminary patch. Since this code is a completely standalone module it will not destabilize the existing server or existing producer (in fact there is no change to those). I will avoid including build support in this patch until we get the gradle stuff worked out so as to not break that patch (hopefully that moves along). Let's take this patch "as is" but with no expectation that the code is complete or that checkin implies everyone agrees with every design decision. I will follow-up with subsequent patches as we do reviews and discussions. 2. I'll send out a few higher-level topics for discussion threads. Let's get to consensus on these. I think micro-reviewing minor correctness issues won't be productive until we make higher level decisions. The topics. I'd like to discuss include a. The producer code: - The public API - The configurations: their names, and the general knobs we are - Client message serialization - The instrumentation to have - The blocking and batching behavior b. The common code and few other cross-cutting policy things - The approach to protocol definition and request serialization - The config definition helper code - The metrics package - The project layout - The java coding style and the use of java - The approach to logging This is somewhat backwards, but I think it will be easier to handle changes that fall out of these discussions against an existing code base that is checked in otherwise each revision will be a brand new very large patch. If no objections I will toss up this code and kick off some of these discussions. -Jay