Gwen, 

Yup, that sounds great! Instead of keeping it up to the transformers to handle 
null, we can instead have the topic as null. Sounds good. To get rid of a 
message, set the topic to a special one (could be as simple as null). 

Like I said before, the more interesting part would be ‘adding’ a new message 
to the existing list, based on say the current message in the transformer. Does 
that feature warrant to be included? 

> On Jul 14, 2016, at 22:25, Gwen Shapira <g...@confluent.io> wrote:
> 
> I used to work on Apache Flume, where we used to allow users to filter
> messages completely in the transformation and then we got rid of it,
> because we spent too much time trying to help users who had "message
> loss", where the loss was actually a bug in the filter...
> 
> What we couldn't do in Flume, but perhaps can do in the simple
> transform for Connect is the ability to route messages to different
> topics, with "null" as one of the possible targets. This will allow
> you to implement a dead-letter-queue functionality and redirect
> messages that don't pass filter to an "errors" topic without getting
> rid of them completely, while also allowing braver users to get rid of
> messages by directing them to "null".
> 
> Does that make sense?
> 
> Gwen
> 
> On Thu, Jul 14, 2016 at 8:33 PM, Nisarg Shah <snis...@gmail.com> wrote:
>> Thank you for your inputs Gwen and Michael.
>> 
>> The original reason why I suggested a set based processing is because of the 
>> flexibility is provides. The JIRA had a comment by a user requesting a 
>> feature that could be achieved with this.
>> 
>> After reading Gwen and Michael's points, (I went through the documentation 
>> and the code in detail) and agree with what you have to say. Also, fewer 
>> guarantees make what I had in mind less certain and thus simplifying it to a 
>> single message based transformation would ensure that users who do require 
>> more flexibility with the transformations will automatically “turn to" Kafka 
>> Streams. The transformation logic on a message by message basis makes more 
>> sense.
>> 
>> One usecase that Kafka Connect could consider is adding or removing a 
>> message completely. (This was trivially possible with the collection 
>> passing). Should users be pointed towards Kafka Streams even for this use 
>> case? I think this is a very useful feature for Connect too, and I’ll try to 
>> rethink on the API too.
>> 
>> Removing a message is as easy as returning a null and having the next 
>> transformer skip it, but adding messages would involve say a queue between 
>> transformers and a method which says “pass message” to the next, which can 
>> be called multiple times from one “transform” function; a variation on the 
>> chain of responsibility design pattern.
>> 
>>> On Jul 12, 2016, at 12:54 AM, Michael Noll <mich...@confluent.io> wrote:
>>> 
>>> As Gwen said, my initial thought is that message transformations that are
>>> "more than trivial" should rather be done by Kafka Streams, rather than by
>>> Kafka Connect (for the reasons that Gwen mentioned).
>>> 
>>> Transforming one message at a time would be a good fit for Kafka Connect.
>>> An important use case is to remove sensitive data (such as PII) from an
>>> incoming data stream before it hits Kafka's persistent storage -- this use
>>> case can't be implemented well with Kafka Streams because, by design, Kafka
>>> Streams is meant to read its input data from Kafka (i.e. at the point when
>>> Kafka Streams could be used to removed sensitive data fields the data is
>>> already stored persistently in Kafka, and this might be a no-go depending
>>> on the use case).
>>> 
>>> I'm of course interested to hear what other people think.
>>> 
>>> 
>>> On Tue, Jul 12, 2016 at 6:06 AM, Gwen Shapira <g...@confluent.io> wrote:
>>> 
>>>> I think we need to restrict the functionality to one-message-at-a-time.
>>>> 
>>>> Basically, connect gives very little guarantees about the size of the set
>>>> of the composition (you may get same messages over and over, mix of old and
>>>> new, etc)
>>>> 
>>>> In order to do useful things over a collection, you need better defined
>>>> semantics of what's included. Kafka Streams is putting tons of effort into
>>>> having good windowing semantics, and I think apps that require modification
>>>> of collections are a better fit there.
>>>> 
>>>> I'm willing to change my mind though (have been known to happen) - what are
>>>> the comments about usage that point toward the collections approach?
>>>> 
>>>> Gwen
>>>> 
>>>> On Mon, Jul 11, 2016 at 3:32 PM, Nisarg Shah <snis...@gmail.com> wrote:
>>>> 
>>>>> Thanks Jay, added that to the KIP.
>>>>> 
>>>>> Besides reviewing the KIP as a whole, I wanted to know about what
>>>> everyone
>>>>> thinks about what data should be dealt at the Transformer level.
>>>> Transform
>>>>> the whole Collection of Records (giving the flexibility of modifying
>>>>> messages across the set) OR
>>>>> Transform messages one at a time, iteratively. This will restrict
>>>>> modifications across messages.
>>>>> 
>>>>> I’ll get a working sample ready soon, to have a look. There were some
>>>>> comments about Transformer usage that pointed to the first approach,
>>>> which
>>>>> I prefer too given the flexibility.
>>>>> 
>>>>>> On Jul 11, 2016, at 2:49 PM, Jay Kreps <j...@confluent.io> wrote:
>>>>>> 
>>>>>> One minor thing, the Transformer interface probably needs a close()
>>>>> method
>>>>>> (i.e. the opposite of initialize). This would be used for any
>>>> transformer
>>>>>> that uses a resource like a file/socket/db connection/etc that needs to
>>>>> be
>>>>>> closed. You usually don't need this but when you do need it you really
>>>>> need
>>>>>> it.
>>>>>> 
>>>>>> -Jay
>>>>>> 
>>>>>> On Mon, Jul 11, 2016 at 1:47 PM, Nisarg Shah <snis...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> This KIP <
>>>>>>> 
>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66:+Add+Kafka+Connect+Transformers+to+allow+transformations+to+messages
>>>>>> 
>>>>>>> is for KAFKA-3209 <https://issues.apache.org/jira/browse/KAFKA-3209>.
>>>>>>> It’s about capabilities to transform messages in Kafka Connect.
>>>>>>> 
>>>>>>> Some design decisions need to be taken, so please advise me on the
>>>> same.
>>>>>>> Feel free to express any thoughts or concerns as well.
>>>>>>> 
>>>>>>> Many many thanks to Ewen Cheslack-Postava.
>>>>>>> 
>>>>>>> -Nisarg
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Michael Noll
>>> 
>>> 
>>> 
>>> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
>>> Apache Kafka and Confluent Platform: www.confluent.io/download
>>> <http://www.confluent.io/download>*
>> 

Reply via email to