Agree that the normal KIP process is awkward for larger changes like this. I'm a +1 on trying out this new process for the processor client, see how it works out and then make that a process for future large changes of this nature.
On Fri, Jul 24, 2015 at 10:03 AM, Jay Kreps <j...@confluent.io> wrote: > I agree that the KIP process doesn't fit well for big areas of development > like the new consumer, copycat, or this. > > I think the approach for copycat where we do a "should this exist" KIP vote > followed by a review on code checkin isn't ideal because of course the > question of "should we do it" is directly tied to the question of "what > will it look like". I'm sure any of us could either be in favor or opposed > to copycat depending on the details of what it looks like. And for these > big things you really need to have a fairly complete prototype to get into > details of how it will work. But we definitely want to do these kind of > things collaboratively so we don't want to wait until we have a finished > prototype and then dump out the code and KIP in final form. My experience > is that it is pretty hard to influence things that are this far along > because by then all the ideas have kind of solidified in the authors' > minds. > > So I think the proposal for this one is to try the follow: > 1. Throw out a stub KIP with essentially no concrete design other than a > problem statement and niche we are trying to address. Start discussion on > this but no vote (because what are you really voting on?). > 2. Get a WIP prototype patch out there quickly and discuss that as it is > being developed and refined. > 3. Solidify the prototype patch and KIP together and do a vote on the KIP > as the final design solidifies. > 4. Do the normal review process for the patch more or less decoupled from > the KIP discussion covering implementation rather than design and user apis > (which the KIP discussion would cover). > > Does this make sense to people? If so let's try it and if we like it better > we can formally make that the process for this kind of big thing. > > -Jay > > > > On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava <e...@confluent.io > > > wrote: > > > Just some notes on the KIP doc itself: > > > > * It'd be useful to clarify at what point the plain consumer + custom > code > > + producer breaks down. I think trivial filtering and aggregation on a > > single stream usually work fine with this model. Anything where you need > > more complex joins, windowing, etc. are where it breaks down. I think > most > > interesting applications require that functionality, but it's helpful to > > make this really clear in the motivation -- right now, Kafka only > provides > > the lowest level plumbing for stream processing applications, so most > > interesting apps require very heavyweight frameworks. > > * I think the feature comparison of plain producer/consumer, stream > > processing frameworks, and this new library is a good start, but we might > > want something more thorough and structured, like a feature matrix. Right > > now it's hard to figure out exactly how they relate to each other. > > * I'd personally push the library vs. framework story very strongly -- > the > > total buy-in and weak integration story of stream processing frameworks > is > > a big downside and makes a library a really compelling (and currently > > unavailable, as far as I am aware) alternative. > > * Comment about in-memory storage of other frameworks is interesting -- > it > > is specific to the framework, but is supposed to also give performance > > benefits. The high-level functional processing interface would allow for > > combining multiple operations when there's no shuffle, but when there is > a > > shuffle, we'll always be writing to Kafka, right? Spark (and presumably > > spark streaming) is supposed to get a big win by handling shuffles such > > that the data just stays in cache and never actually hits disk, or at > least > > hits disk in the background. Will we take a hit because we always write > to > > Kafka? > > * I really struggled with the structure of the KIP template with Copycat > > because the flow doesn't work well for proposals like this. They aren't > as > > concrete changes as the KIP template was designed for. I'd completely > > ignore that template in favor of optimizing for clarity if I were you. > > > > -Ewen > > > > On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Hi all, > > > > > > I just posted KIP-28: Add a transform client for data processing > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing > > > > > > > . > > > > > > The wiki page does not yet have the full design / implementation > details, > > > and this email is to kick-off the conversation on whether we should add > > > this new client with the described motivations, and if yes what > features > > / > > > functionalities should be included. > > > > > > Looking forward to your feedback! > > > > > > -- Guozhang > > > > > > > > > > > -- > > Thanks, > > Ewen > > > -- Thanks, Neha