Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Jiangjie Qin Thu, 23 Jul 2015 23:32:59 -0700

Hey Guozhang,

I just took a quick look at the KIP, is it very similar to mirror maker
with message handler?


Thanks,

Jiangjie (Becket) Qin

On Thu, Jul 23, 2015 at 10:25 PM, Ewen Cheslack-Postava <[email protected]>
wrote:

> Just some notes on the KIP doc itself:
>
> * It'd be useful to clarify at what point the plain consumer + custom code
> + producer breaks down. I think trivial filtering and aggregation on a
> single stream usually work fine with this model. Anything where you need
> more complex joins, windowing, etc. are where it breaks down. I think most
> interesting applications require that functionality, but it's helpful to
> make this really clear in the motivation -- right now, Kafka only provides
> the lowest level plumbing for stream processing applications, so most
> interesting apps require very heavyweight frameworks.
> * I think the feature comparison of plain producer/consumer, stream
> processing frameworks, and this new library is a good start, but we might
> want something more thorough and structured, like a feature matrix. Right
> now it's hard to figure out exactly how they relate to each other.
> * I'd personally push the library vs. framework story very strongly -- the
> total buy-in and weak integration story of stream processing frameworks is
> a big downside and makes a library a really compelling (and currently
> unavailable, as far as I am aware) alternative.
> * Comment about in-memory storage of other frameworks is interesting -- it
> is specific to the framework, but is supposed to also give performance
> benefits. The high-level functional processing interface would allow for
> combining multiple operations when there's no shuffle, but when there is a
> shuffle, we'll always be writing to Kafka, right? Spark (and presumably
> spark streaming) is supposed to get a big win by handling shuffles such
> that the data just stays in cache and never actually hits disk, or at least
> hits disk in the background. Will we take a hit because we always write to
> Kafka?
> * I really struggled with the structure of the KIP template with Copycat
> because the flow doesn't work well for proposals like this. They aren't as
> concrete changes as the KIP template was designed for. I'd completely
> ignore that template in favor of optimizing for clarity if I were you.
>
> -Ewen
>
> On Thu, Jul 23, 2015 at 5:59 PM, Guozhang Wang <[email protected]> wrote:
>
> > Hi all,
> >
> > I just posted KIP-28: Add a transform client for data processing
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+transform+client+for+data+processing
> > >
> > .
> >
> > The wiki page does not yet have the full design / implementation details,
> > and this email is to kick-off the conversation on whether we should add
> > this new client with the described motivations, and if yes what features
> /
> > functionalities should be included.
> >
> > Looking forward to your feedback!
> >
> > -- Guozhang
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: [DISCUSS] KIP-28 - Add a transform client for data processing

Reply via email to