Does the transformation need to be 1-to-1? For example, some users model
each Kafka message as schema + a batch of binary records. When using a sink
connector to push the Kafka data to a sink, if would be useful if the
transformer can convert each Kafka message to multiple records.

Thanks,

Jun

On Sat, Jul 16, 2016 at 1:25 PM, Nisarg Shah <snis...@gmail.com> wrote:

> Gwen,
>
> Yup, that sounds great! Instead of keeping it up to the transformers to
> handle null, we can instead have the topic as null. Sounds good. To get rid
> of a message, set the topic to a special one (could be as simple as null).
>
> Like I said before, the more interesting part would be ‘adding’ a new
> message to the existing list, based on say the current message in the
> transformer. Does that feature warrant to be included?
>
> > On Jul 14, 2016, at 22:25, Gwen Shapira <g...@confluent.io> wrote:
> >
> > I used to work on Apache Flume, where we used to allow users to filter
> > messages completely in the transformation and then we got rid of it,
> > because we spent too much time trying to help users who had "message
> > loss", where the loss was actually a bug in the filter...
> >
> > What we couldn't do in Flume, but perhaps can do in the simple
> > transform for Connect is the ability to route messages to different
> > topics, with "null" as one of the possible targets. This will allow
> > you to implement a dead-letter-queue functionality and redirect
> > messages that don't pass filter to an "errors" topic without getting
> > rid of them completely, while also allowing braver users to get rid of
> > messages by directing them to "null".
> >
> > Does that make sense?
> >
> > Gwen
> >
> > On Thu, Jul 14, 2016 at 8:33 PM, Nisarg Shah <snis...@gmail.com> wrote:
> >> Thank you for your inputs Gwen and Michael.
> >>
> >> The original reason why I suggested a set based processing is because
> of the flexibility is provides. The JIRA had a comment by a user requesting
> a feature that could be achieved with this.
> >>
> >> After reading Gwen and Michael's points, (I went through the
> documentation and the code in detail) and agree with what you have to say.
> Also, fewer guarantees make what I had in mind less certain and thus
> simplifying it to a single message based transformation would ensure that
> users who do require more flexibility with the transformations will
> automatically “turn to" Kafka Streams. The transformation logic on a
> message by message basis makes more sense.
> >>
> >> One usecase that Kafka Connect could consider is adding or removing a
> message completely. (This was trivially possible with the collection
> passing). Should users be pointed towards Kafka Streams even for this use
> case? I think this is a very useful feature for Connect too, and I’ll try
> to rethink on the API too.
> >>
> >> Removing a message is as easy as returning a null and having the next
> transformer skip it, but adding messages would involve say a queue between
> transformers and a method which says “pass message” to the next, which can
> be called multiple times from one “transform” function; a variation on the
> chain of responsibility design pattern.
> >>
> >>> On Jul 12, 2016, at 12:54 AM, Michael Noll <mich...@confluent.io>
> wrote:
> >>>
> >>> As Gwen said, my initial thought is that message transformations that
> are
> >>> "more than trivial" should rather be done by Kafka Streams, rather
> than by
> >>> Kafka Connect (for the reasons that Gwen mentioned).
> >>>
> >>> Transforming one message at a time would be a good fit for Kafka
> Connect.
> >>> An important use case is to remove sensitive data (such as PII) from an
> >>> incoming data stream before it hits Kafka's persistent storage -- this
> use
> >>> case can't be implemented well with Kafka Streams because, by design,
> Kafka
> >>> Streams is meant to read its input data from Kafka (i.e. at the point
> when
> >>> Kafka Streams could be used to removed sensitive data fields the data
> is
> >>> already stored persistently in Kafka, and this might be a no-go
> depending
> >>> on the use case).
> >>>
> >>> I'm of course interested to hear what other people think.
> >>>
> >>>
> >>> On Tue, Jul 12, 2016 at 6:06 AM, Gwen Shapira <g...@confluent.io>
> wrote:
> >>>
> >>>> I think we need to restrict the functionality to
> one-message-at-a-time.
> >>>>
> >>>> Basically, connect gives very little guarantees about the size of the
> set
> >>>> of the composition (you may get same messages over and over, mix of
> old and
> >>>> new, etc)
> >>>>
> >>>> In order to do useful things over a collection, you need better
> defined
> >>>> semantics of what's included. Kafka Streams is putting tons of effort
> into
> >>>> having good windowing semantics, and I think apps that require
> modification
> >>>> of collections are a better fit there.
> >>>>
> >>>> I'm willing to change my mind though (have been known to happen) -
> what are
> >>>> the comments about usage that point toward the collections approach?
> >>>>
> >>>> Gwen
> >>>>
> >>>> On Mon, Jul 11, 2016 at 3:32 PM, Nisarg Shah <snis...@gmail.com>
> wrote:
> >>>>
> >>>>> Thanks Jay, added that to the KIP.
> >>>>>
> >>>>> Besides reviewing the KIP as a whole, I wanted to know about what
> >>>> everyone
> >>>>> thinks about what data should be dealt at the Transformer level.
> >>>> Transform
> >>>>> the whole Collection of Records (giving the flexibility of modifying
> >>>>> messages across the set) OR
> >>>>> Transform messages one at a time, iteratively. This will restrict
> >>>>> modifications across messages.
> >>>>>
> >>>>> I’ll get a working sample ready soon, to have a look. There were some
> >>>>> comments about Transformer usage that pointed to the first approach,
> >>>> which
> >>>>> I prefer too given the flexibility.
> >>>>>
> >>>>>> On Jul 11, 2016, at 2:49 PM, Jay Kreps <j...@confluent.io> wrote:
> >>>>>>
> >>>>>> One minor thing, the Transformer interface probably needs a close()
> >>>>> method
> >>>>>> (i.e. the opposite of initialize). This would be used for any
> >>>> transformer
> >>>>>> that uses a resource like a file/socket/db connection/etc that
> needs to
> >>>>> be
> >>>>>> closed. You usually don't need this but when you do need it you
> really
> >>>>> need
> >>>>>> it.
> >>>>>>
> >>>>>> -Jay
> >>>>>>
> >>>>>> On Mon, Jul 11, 2016 at 1:47 PM, Nisarg Shah <snis...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> This KIP <
> >>>>>>>
> >>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66:+Add+Kafka+Connect+Transformers+to+allow+transformations+to+messages
> >>>>>>
> >>>>>>> is for KAFKA-3209 <
> https://issues.apache.org/jira/browse/KAFKA-3209>.
> >>>>>>> It’s about capabilities to transform messages in Kafka Connect.
> >>>>>>>
> >>>>>>> Some design decisions need to be taken, so please advise me on the
> >>>> same.
> >>>>>>> Feel free to express any thoughts or concerns as well.
> >>>>>>>
> >>>>>>> Many many thanks to Ewen Cheslack-Postava.
> >>>>>>>
> >>>>>>> -Nisarg
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Michael Noll
> >>>
> >>>
> >>>
> >>> *Michael G. Noll | Product Manager | Confluent | +1
> 650.453.5860Download
> >>> Apache Kafka and Confluent Platform: www.confluent.io/download
> >>> <http://www.confluent.io/download>*
> >>
>
>

Reply via email to