Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

Gwen Shapira Thu, 14 Jul 2016 22:26:31 -0700

I used to work on Apache Flume, where we used to allow users to filter
messages completely in the transformation and then we got rid of it,
because we spent too much time trying to help users who had "message
loss", where the loss was actually a bug in the filter...


What we couldn't do in Flume, but perhaps can do in the simple
transform for Connect is the ability to route messages to different
topics, with "null" as one of the possible targets. This will allow
you to implement a dead-letter-queue functionality and redirect
messages that don't pass filter to an "errors" topic without getting
rid of them completely, while also allowing braver users to get rid of
messages by directing them to "null".

Does that make sense?

Gwen

On Thu, Jul 14, 2016 at 8:33 PM, Nisarg Shah <[email protected]> wrote:
> Thank you for your inputs Gwen and Michael.
>
> The original reason why I suggested a set based processing is because of the 
> flexibility is provides. The JIRA had a comment by a user requesting a 
> feature that could be achieved with this.
>
> After reading Gwen and Michael's points, (I went through the documentation 
> and the code in detail) and agree with what you have to say. Also, fewer 
> guarantees make what I had in mind less certain and thus simplifying it to a 
> single message based transformation would ensure that users who do require 
> more flexibility with the transformations will automatically “turn to" Kafka 
> Streams. The transformation logic on a message by message basis makes more 
> sense.
>
> One usecase that Kafka Connect could consider is adding or removing a message 
> completely. (This was trivially possible with the collection passing). Should 
> users be pointed towards Kafka Streams even for this use case? I think this 
> is a very useful feature for Connect too, and I’ll try to rethink on the API 
> too.
>
> Removing a message is as easy as returning a null and having the next 
> transformer skip it, but adding messages would involve say a queue between 
> transformers and a method which says “pass message” to the next, which can be 
> called multiple times from one “transform” function; a variation on the chain 
> of responsibility design pattern.
>
>> On Jul 12, 2016, at 12:54 AM, Michael Noll <[email protected]> wrote:
>>
>> As Gwen said, my initial thought is that message transformations that are
>> "more than trivial" should rather be done by Kafka Streams, rather than by
>> Kafka Connect (for the reasons that Gwen mentioned).
>>
>> Transforming one message at a time would be a good fit for Kafka Connect.
>> An important use case is to remove sensitive data (such as PII) from an
>> incoming data stream before it hits Kafka's persistent storage -- this use
>> case can't be implemented well with Kafka Streams because, by design, Kafka
>> Streams is meant to read its input data from Kafka (i.e. at the point when
>> Kafka Streams could be used to removed sensitive data fields the data is
>> already stored persistently in Kafka, and this might be a no-go depending
>> on the use case).
>>
>> I'm of course interested to hear what other people think.
>>
>>
>> On Tue, Jul 12, 2016 at 6:06 AM, Gwen Shapira <[email protected]> wrote:
>>
>>> I think we need to restrict the functionality to one-message-at-a-time.
>>>
>>> Basically, connect gives very little guarantees about the size of the set
>>> of the composition (you may get same messages over and over, mix of old and
>>> new, etc)
>>>
>>> In order to do useful things over a collection, you need better defined
>>> semantics of what's included. Kafka Streams is putting tons of effort into
>>> having good windowing semantics, and I think apps that require modification
>>> of collections are a better fit there.
>>>
>>> I'm willing to change my mind though (have been known to happen) - what are
>>> the comments about usage that point toward the collections approach?
>>>
>>> Gwen
>>>
>>> On Mon, Jul 11, 2016 at 3:32 PM, Nisarg Shah <[email protected]> wrote:
>>>
>>>> Thanks Jay, added that to the KIP.
>>>>
>>>> Besides reviewing the KIP as a whole, I wanted to know about what
>>> everyone
>>>> thinks about what data should be dealt at the Transformer level.
>>> Transform
>>>> the whole Collection of Records (giving the flexibility of modifying
>>>> messages across the set) OR
>>>> Transform messages one at a time, iteratively. This will restrict
>>>> modifications across messages.
>>>>
>>>> I’ll get a working sample ready soon, to have a look. There were some
>>>> comments about Transformer usage that pointed to the first approach,
>>> which
>>>> I prefer too given the flexibility.
>>>>
>>>>> On Jul 11, 2016, at 2:49 PM, Jay Kreps <[email protected]> wrote:
>>>>>
>>>>> One minor thing, the Transformer interface probably needs a close()
>>>> method
>>>>> (i.e. the opposite of initialize). This would be used for any
>>> transformer
>>>>> that uses a resource like a file/socket/db connection/etc that needs to
>>>> be
>>>>> closed. You usually don't need this but when you do need it you really
>>>> need
>>>>> it.
>>>>>
>>>>> -Jay
>>>>>
>>>>> On Mon, Jul 11, 2016 at 1:47 PM, Nisarg Shah <[email protected]>
>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> This KIP <
>>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-66:+Add+Kafka+Connect+Transformers+to+allow+transformations+to+messages
>>>>>
>>>>>> is for KAFKA-3209 <https://issues.apache.org/jira/browse/KAFKA-3209>.
>>>>>> It’s about capabilities to transform messages in Kafka Connect.
>>>>>>
>>>>>> Some design decisions need to be taken, so please advise me on the
>>> same.
>>>>>> Feel free to express any thoughts or concerns as well.
>>>>>>
>>>>>> Many many thanks to Ewen Cheslack-Postava.
>>>>>>
>>>>>> -Nisarg
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Best regards,
>> Michael Noll
>>
>>
>>
>> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
>> Apache Kafka and Confluent Platform: www.confluent.io/download
>> <http://www.confluent.io/download>*
>

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

Reply via email to