Gwen, Yup, that sounds great! Instead of keeping it up to the transformers to handle null, we can instead have the topic as null. Sounds good. To get rid of a message, set the topic to a special one (could be as simple as null).
Like I said before, the more interesting part would be ‘adding’ a new message to the existing list, based on say the current message in the transformer. Does that feature warrant to be included? > On Jul 14, 2016, at 22:25, Gwen Shapira <g...@confluent.io> wrote: > > I used to work on Apache Flume, where we used to allow users to filter > messages completely in the transformation and then we got rid of it, > because we spent too much time trying to help users who had "message > loss", where the loss was actually a bug in the filter... > > What we couldn't do in Flume, but perhaps can do in the simple > transform for Connect is the ability to route messages to different > topics, with "null" as one of the possible targets. This will allow > you to implement a dead-letter-queue functionality and redirect > messages that don't pass filter to an "errors" topic without getting > rid of them completely, while also allowing braver users to get rid of > messages by directing them to "null". > > Does that make sense? > > Gwen > > On Thu, Jul 14, 2016 at 8:33 PM, Nisarg Shah <snis...@gmail.com> wrote: >> Thank you for your inputs Gwen and Michael. >> >> The original reason why I suggested a set based processing is because of the >> flexibility is provides. The JIRA had a comment by a user requesting a >> feature that could be achieved with this. >> >> After reading Gwen and Michael's points, (I went through the documentation >> and the code in detail) and agree with what you have to say. Also, fewer >> guarantees make what I had in mind less certain and thus simplifying it to a >> single message based transformation would ensure that users who do require >> more flexibility with the transformations will automatically “turn to" Kafka >> Streams. The transformation logic on a message by message basis makes more >> sense. >> >> One usecase that Kafka Connect could consider is adding or removing a >> message completely. (This was trivially possible with the collection >> passing). Should users be pointed towards Kafka Streams even for this use >> case? I think this is a very useful feature for Connect too, and I’ll try to >> rethink on the API too. >> >> Removing a message is as easy as returning a null and having the next >> transformer skip it, but adding messages would involve say a queue between >> transformers and a method which says “pass message” to the next, which can >> be called multiple times from one “transform” function; a variation on the >> chain of responsibility design pattern. >> >>> On Jul 12, 2016, at 12:54 AM, Michael Noll <mich...@confluent.io> wrote: >>> >>> As Gwen said, my initial thought is that message transformations that are >>> "more than trivial" should rather be done by Kafka Streams, rather than by >>> Kafka Connect (for the reasons that Gwen mentioned). >>> >>> Transforming one message at a time would be a good fit for Kafka Connect. >>> An important use case is to remove sensitive data (such as PII) from an >>> incoming data stream before it hits Kafka's persistent storage -- this use >>> case can't be implemented well with Kafka Streams because, by design, Kafka >>> Streams is meant to read its input data from Kafka (i.e. at the point when >>> Kafka Streams could be used to removed sensitive data fields the data is >>> already stored persistently in Kafka, and this might be a no-go depending >>> on the use case). >>> >>> I'm of course interested to hear what other people think. >>> >>> >>> On Tue, Jul 12, 2016 at 6:06 AM, Gwen Shapira <g...@confluent.io> wrote: >>> >>>> I think we need to restrict the functionality to one-message-at-a-time. >>>> >>>> Basically, connect gives very little guarantees about the size of the set >>>> of the composition (you may get same messages over and over, mix of old and >>>> new, etc) >>>> >>>> In order to do useful things over a collection, you need better defined >>>> semantics of what's included. Kafka Streams is putting tons of effort into >>>> having good windowing semantics, and I think apps that require modification >>>> of collections are a better fit there. >>>> >>>> I'm willing to change my mind though (have been known to happen) - what are >>>> the comments about usage that point toward the collections approach? >>>> >>>> Gwen >>>> >>>> On Mon, Jul 11, 2016 at 3:32 PM, Nisarg Shah <snis...@gmail.com> wrote: >>>> >>>>> Thanks Jay, added that to the KIP. >>>>> >>>>> Besides reviewing the KIP as a whole, I wanted to know about what >>>> everyone >>>>> thinks about what data should be dealt at the Transformer level. >>>> Transform >>>>> the whole Collection of Records (giving the flexibility of modifying >>>>> messages across the set) OR >>>>> Transform messages one at a time, iteratively. This will restrict >>>>> modifications across messages. >>>>> >>>>> I’ll get a working sample ready soon, to have a look. There were some >>>>> comments about Transformer usage that pointed to the first approach, >>>> which >>>>> I prefer too given the flexibility. >>>>> >>>>>> On Jul 11, 2016, at 2:49 PM, Jay Kreps <j...@confluent.io> wrote: >>>>>> >>>>>> One minor thing, the Transformer interface probably needs a close() >>>>> method >>>>>> (i.e. the opposite of initialize). This would be used for any >>>> transformer >>>>>> that uses a resource like a file/socket/db connection/etc that needs to >>>>> be >>>>>> closed. You usually don't need this but when you do need it you really >>>>> need >>>>>> it. >>>>>> >>>>>> -Jay >>>>>> >>>>>> On Mon, Jul 11, 2016 at 1:47 PM, Nisarg Shah <snis...@gmail.com> >>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> This KIP < >>>>>>> >>>>> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66:+Add+Kafka+Connect+Transformers+to+allow+transformations+to+messages >>>>>> >>>>>>> is for KAFKA-3209 <https://issues.apache.org/jira/browse/KAFKA-3209>. >>>>>>> It’s about capabilities to transform messages in Kafka Connect. >>>>>>> >>>>>>> Some design decisions need to be taken, so please advise me on the >>>> same. >>>>>>> Feel free to express any thoughts or concerns as well. >>>>>>> >>>>>>> Many many thanks to Ewen Cheslack-Postava. >>>>>>> >>>>>>> -Nisarg >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> Best regards, >>> Michael Noll >>> >>> >>> >>> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download >>> Apache Kafka and Confluent Platform: www.confluent.io/download >>> <http://www.confluent.io/download>* >>