Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Jay Kreps Fri, 24 Jul 2015 10:04:57 -0700

I agree that the KIP process doesn't fit well for big areas of development
like the new consumer, copycat, or this.

I think the approach for copycat where we do a "should this exist" KIP vote
followed by a review on code checkin isn't ideal because of course the
question of "should we do it" is directly tied to the question of "what
will it look like". I'm sure any of us could either be in favor or opposed
to copycat depending on the details of what it looks like. And for these
big things you really need to have a fairly complete prototype to get into
details of how it will work. But we definitely want to do these kind of
things collaboratively so we don't want to wait until we have a finished
prototype and then dump out the code and KIP in final form. My experience
is that it is pretty hard to influence things that are this far along
because by then all the ideas have kind of solidified in the authors' minds.

So I think the proposal for this one is to try the follow:
1. Throw out a stub KIP with essentially no concrete design other than a
problem statement and niche we are trying to address. Start discussion on
this but no vote (because what are you really voting on?).
2. Get a WIP prototype patch out there quickly and discuss that as it is
being developed and refined.
3. Solidify the prototype patch and KIP together and do a vote on the KIP
as the final design solidifies.
4. Do the normal review process for the patch more or less decoupled from
the KIP discussion covering implementation rather than design and user apis
(which the KIP discussion would cover).

Does this make sense to people? If so let's try it and if we like it better
we can formally make that the process for this kind of big thing.

-Jay

On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Just some notes on the KIP doc itself:
>
> * It'd be useful to clarify at what point the plain consumer + custom code
> + producer breaks down. I think trivial filtering and aggregation on a
> single stream usually work fine with this model. Anything where you need
> more complex joins, windowing, etc. are where it breaks down. I think most
> interesting applications require that functionality, but it's helpful to
> make this really clear in the motivation -- right now, Kafka only provides
> the lowest level plumbing for stream processing applications, so most
> interesting apps require very heavyweight frameworks.
> * I think the feature comparison of plain producer/consumer, stream
> processing frameworks, and this new library is a good start, but we might
> want something more thorough and structured, like a feature matrix. Right
> now it's hard to figure out exactly how they relate to each other.
> * I'd personally push the library vs. framework story very strongly -- the
> total buy-in and weak integration story of stream processing frameworks is
> a big downside and makes a library a really compelling (and currently
> unavailable, as far as I am aware) alternative.
> * Comment about in-memory storage of other frameworks is interesting -- it
> is specific to the framework, but is supposed to also give performance
> benefits. The high-level functional processing interface would allow for
> combining multiple operations when there's no shuffle, but when there is a
> shuffle, we'll always be writing to Kafka, right? Spark (and presumably
> spark streaming) is supposed to get a big win by handling shuffles such
> that the data just stays in cache and never actually hits disk, or at least
> hits disk in the background. Will we take a hit because we always write to
> Kafka?
> * I really struggled with the structure of the KIP template with Copycat
> because the flow doesn't work well for proposals like this. They aren't as
> concrete changes as the KIP template was designed for. I'd completely
> ignore that template in favor of optimizing for clarity if I were you.
>
> -Ewen
>
> On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi all,
> >
> > I just posted KIP-28: Add a transform client for data processing
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
> > >
> > .
> >
> > The wiki page does not yet have the full design / implementation details,
> > and this email is to kick-off the conversation on whether we should add
> > this new client with the described motivations, and if yes what features
> /
> > functionalities should be included.
> >
> > Looking forward to your feedback!
> >
> > -- Guozhang
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Reply via email to