+1 for checking this in as is. For a from-scratch rewrite like this, I prefer to do incremental reviews on a standalone subproject until it is complete and stable to be merged into the main codebase. Looking forward to the patch!
Thanks, Neha On Thu, Jan 23, 2014 at 10:23 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > Hey all, > > I have been working on a rewrite of the producer as described in the wiki > below and discussed in a few previous threads: > https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite > > My code is still has some bugs and is a bit rough in parts, but it > functions in the basic cases. I did some basic performance tests over > localhost, and the new approach has paid off quite significantly--for small > (10 byte) messages a single thread on my laptop can send over 1m > messages/second, and with larger messages easily maxes out the server. > > The difference between "sync" and "async" largely producer disappears--all > requests immediately return a future response which can be used to get the > behavior of either sync or async usage and we batch whenever the producer > is under load using a "group commit"-like approach. You can encourage > additional batching by incurring a small amount of latency (as before). > > Let's talk about how to integrate this code. > > This is a from-scratch rewrite of the producer code. As such it is a pretty > major change. So far I have mostly been working on my own. I'd like to > start getting feedback before I get too far along--no point in my polishing > things that are going to be significantly revised in review, after all. > > As such here is what I would propose: > > 1. I'll put up a preliminary patch. Since this code is a completely > standalone module it will not destabilize the existing server or existing > producer (in fact there is no change to those). I will avoid including > build support in this patch until we get the gradle stuff worked out so as > to not break that patch (hopefully that moves along). Let's take this patch > "as is" but with no expectation that the code is complete or that checkin > implies everyone agrees with every design decision. I will follow-up with > subsequent patches as we do reviews and discussions. > > 2. I'll send out a few higher-level topics for discussion threads. Let's > get to consensus on these. I think micro-reviewing minor correctness issues > won't be productive until we make higher level decisions. The topics. I'd > like to discuss include > a. The producer code: > - The public API > - The configurations: their names, and the general knobs we are > - Client message serialization > - The instrumentation to have > - The blocking and batching behavior > b. The common code and few other cross-cutting policy things > - The approach to protocol definition and request serialization > - The config definition helper code > - The metrics package > - The project layout > - The java coding style and the use of java > - The approach to logging > > This is somewhat backwards, but I think it will be easier to handle changes > that fall out of these discussions against an existing code base that is > checked in otherwise each revision will be a brand new very large patch. > > If no objections I will toss up this code and kick off some of these > discussions. > > -Jay >