awesome! +1 for checking this in as is as you suggest /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/
On Thu, Jan 23, 2014 at 2:37 PM, Jun Rao <jun...@gmail.com> wrote: > This approach sounds reasonable to me. Since the new code will be not be > used in the current kafka jar, we can still release 0.8.1 off trunk when > it's ready. > > Thanks, > > Jun > > > On Thu, Jan 23, 2014 at 10:23 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Hey all, > > > > I have been working on a rewrite of the producer as described in the wiki > > below and discussed in a few previous threads: > > https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite > > > > My code is still has some bugs and is a bit rough in parts, but it > > functions in the basic cases. I did some basic performance tests over > > localhost, and the new approach has paid off quite significantly--for > small > > (10 byte) messages a single thread on my laptop can send over 1m > > messages/second, and with larger messages easily maxes out the server. > > > > The difference between "sync" and "async" largely producer > disappears--all > > requests immediately return a future response which can be used to get > the > > behavior of either sync or async usage and we batch whenever the producer > > is under load using a "group commit"-like approach. You can encourage > > additional batching by incurring a small amount of latency (as before). > > > > Let's talk about how to integrate this code. > > > > This is a from-scratch rewrite of the producer code. As such it is a > pretty > > major change. So far I have mostly been working on my own. I'd like to > > start getting feedback before I get too far along--no point in my > polishing > > things that are going to be significantly revised in review, after all. > > > > As such here is what I would propose: > > > > 1. I'll put up a preliminary patch. Since this code is a completely > > standalone module it will not destabilize the existing server or existing > > producer (in fact there is no change to those). I will avoid including > > build support in this patch until we get the gradle stuff worked out so > as > > to not break that patch (hopefully that moves along). Let's take this > patch > > "as is" but with no expectation that the code is complete or that checkin > > implies everyone agrees with every design decision. I will follow-up with > > subsequent patches as we do reviews and discussions. > > > > 2. I'll send out a few higher-level topics for discussion threads. Let's > > get to consensus on these. I think micro-reviewing minor correctness > issues > > won't be productive until we make higher level decisions. The topics. I'd > > like to discuss include > > a. The producer code: > > - The public API > > - The configurations: their names, and the general knobs we are > > - Client message serialization > > - The instrumentation to have > > - The blocking and batching behavior > > b. The common code and few other cross-cutting policy things > > - The approach to protocol definition and request serialization > > - The config definition helper code > > - The metrics package > > - The project layout > > - The java coding style and the use of java > > - The approach to logging > > > > This is somewhat backwards, but I think it will be easier to handle > changes > > that fall out of these discussions against an existing code base that is > > checked in otherwise each revision will be a brand new very large patch. > > > > If no objections I will toss up this code and kick off some of these > > discussions. > > > > -Jay > > >