Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Neha Narkhede Fri, 24 Jul 2015 10:17:53 -0700

Agree that the normal KIP process is awkward for larger changes like this.
I'm a +1 on trying out this new process for the processor client, see how
it works out and then make that a process for future large changes of this
nature.


On Fri, Jul 24, 2015 at 10:03 AM, Jay Kreps <[email protected]> wrote:

> I agree that the KIP process doesn't fit well for big areas of development
> like the new consumer, copycat, or this.
>
> I think the approach for copycat where we do a "should this exist" KIP vote
> followed by a review on code checkin isn't ideal because of course the
> question of "should we do it" is directly tied to the question of "what
> will it look like". I'm sure any of us could either be in favor or opposed
> to copycat depending on the details of what it looks like. And for these
> big things you really need to have a fairly complete prototype to get into
> details of how it will work. But we definitely want to do these kind of
> things collaboratively so we don't want to wait until we have a finished
> prototype and then dump out the code and KIP in final form. My experience
> is that it is pretty hard to influence things that are this far along
> because by then all the ideas have kind of solidified in the authors'
> minds.
>
> So I think the proposal for this one is to try the follow:
> 1. Throw out a stub KIP with essentially no concrete design other than a
> problem statement and niche we are trying to address. Start discussion on
> this but no vote (because what are you really voting on?).
> 2. Get a WIP prototype patch out there quickly and discuss that as it is
> being developed and refined.
> 3. Solidify the prototype patch and KIP together and do a vote on the KIP
> as the final design solidifies.
> 4. Do the normal review process for the patch more or less decoupled from
> the KIP discussion covering implementation rather than design and user apis
> (which the KIP discussion would cover).
>
> Does this make sense to people? If so let's try it and if we like it better
> we can formally make that the process for this kind of big thing.
>
> -Jay
>
>
>
> On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava <[email protected]
> >
> wrote:
>
> > Just some notes on the KIP doc itself:
> >
> > * It'd be useful to clarify at what point the plain consumer + custom
> code
> > + producer breaks down. I think trivial filtering and aggregation on a
> > single stream usually work fine with this model. Anything where you need
> > more complex joins, windowing, etc. are where it breaks down. I think
> most
> > interesting applications require that functionality, but it's helpful to
> > make this really clear in the motivation -- right now, Kafka only
> provides
> > the lowest level plumbing for stream processing applications, so most
> > interesting apps require very heavyweight frameworks.
> > * I think the feature comparison of plain producer/consumer, stream
> > processing frameworks, and this new library is a good start, but we might
> > want something more thorough and structured, like a feature matrix. Right
> > now it's hard to figure out exactly how they relate to each other.
> > * I'd personally push the library vs. framework story very strongly --
> the
> > total buy-in and weak integration story of stream processing frameworks
> is
> > a big downside and makes a library a really compelling (and currently
> > unavailable, as far as I am aware) alternative.
> > * Comment about in-memory storage of other frameworks is interesting --
> it
> > is specific to the framework, but is supposed to also give performance
> > benefits. The high-level functional processing interface would allow for
> > combining multiple operations when there's no shuffle, but when there is
> a
> > shuffle, we'll always be writing to Kafka, right? Spark (and presumably
> > spark streaming) is supposed to get a big win by handling shuffles such
> > that the data just stays in cache and never actually hits disk, or at
> least
> > hits disk in the background. Will we take a hit because we always write
> to
> > Kafka?
> > * I really struggled with the structure of the KIP template with Copycat
> > because the flow doesn't work well for proposals like this. They aren't
> as
> > concrete changes as the KIP template was designed for. I'd completely
> > ignore that template in favor of optimizing for clarity if I were you.
> >
> > -Ewen
> >
> > On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang <[email protected]>
> wrote:
> >
> > > Hi all,
> > >
> > > I just posted KIP-28: Add a transform client for data processing
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
> > > >
> > > .
> > >
> > > The wiki page does not yet have the full design / implementation
> details,
> > > and this email is to kick-off the conversation on whether we should add
> > > this new client with the described motivations, and if yes what
> features
> > /
> > > functionalities should be included.
> > >
> > > Looking forward to your feedback!
> > >
> > > -- Guozhang
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>



-- 
Thanks,
Neha

Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Reply via email to