Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2017-01-03 Thread Shikhar Bhushan
Makes sense Ewen, I edited the KIP to include this criteria. I'd like to start a voting thread soon unless anyone has additional points for discussion. On Fri, Dec 30, 2016 at 12:14 PM Ewen Cheslack-Postava wrote: On Thu, Dec 15, 2016 at 7:41 PM, Shikhar Bhushan wrote: > There is no decision

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-30 Thread Ewen Cheslack-Postava
On Thu, Dec 15, 2016 at 7:41 PM, Shikhar Bhushan wrote: > There is no decision being proposed on the final list of transformations > that will ever be in Kafka :-) Just the initial set we should roll with. > I'd second this comment as well. I'm very wary of the slippery slope, which is why I was

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
There is no decision being proposed on the final list of transformations that will ever be in Kafka :-) Just the initial set we should roll with. On Thu, Dec 15, 2016 at 3:34 PM Gwen Shapira wrote: You are absolutely right that the vast majority of NiFi's processors are not what we would conside

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Gwen Shapira
You are absolutely right that the vast majority of NiFi's processors are not what we would consider SMT. I went over the list and I think the still contain just short of 50 legit SMTs: https://cwiki.apache.org/confluence/display/KAFKA/Analyzing+NiFi+Transformations You are right that ExtractHL7 i

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Ewen Cheslack-Postava
I think there are a couple of factors that make transformations and connectors different. First, NiFi's 150 processors is a bit misleading. In NiFi, processors cover data sources, data sinks, serialization/deserialization, *and* transformations. I haven't filtered the list to see how many fall int

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
I think the tradeoffs for including connectors are different. Connectors are comparatively larger in scope, they tend to come with their own set of dependencies for the systems they need to talk to. Transformations as I imagine them - at least the ones on the table in the wiki currently - should be

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Gwen Shapira
I agree about the ease of use in adding a small-subset of built-in transformations. But the same thing is true for connectors - there are maybe 5 super popular OSS connectors and the rest is a very long tail. We drew the line at not adding any, because thats the easiest and because we did not want

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
I have updated KIP-66 https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect with the changes I proposed in the design. Gwen, I think the main downside to not including some transformations with Kafka Connect is that it seems less user friendly if f

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-14 Thread Gwen Shapira
I'm a bit concerned about adding transformations in Kafka. NiFi has 150 processors, presumably they are all useful for someone. I don't know if I'd want all of that in Apache Kafka. What's the downside of keeping it out? Or at least keeping the built-in set super minimal (Flume has like 3 built-in

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-14 Thread Shikhar Bhushan
With regard to a), just using `ConnectRecord` with `newRecord` as a new abstract method would be a fine choice. In prototyping, both options end up looking pretty similar (in terms of how transformations are implemented and the runtime initializes and uses them) and I'm starting to lean towards not

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-10 Thread Ewen Cheslack-Postava
If anyone has time to review here, it'd be great to get feedback. I'd imagine that the proposal itself won't be too controversial -- keeps transformations simple (by only allowing map/filter), doesn't affect the rest of the framework much, and fits in with general config structure we've used elsewh

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-07 Thread Shikhar Bhushan
Hi all, I have another iteration at a proposal for this feature here: https://cwiki.apache.org/confluence/display/KAFKA/Connect+Transforms+-+Proposed+Design I'd welcome your feedback and comments. Thanks, Shikhar On Tue, Aug 2, 2016 at 7:21 PM Ewen Cheslack-Postava wrote: On Thu, Jul 28, 201

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-08-02 Thread Ewen Cheslack-Postava
On Thu, Jul 28, 2016 at 11:58 PM, Shikhar Bhushan wrote: > > > > > > Hmm, operating on ConnectRecords probably doesn't work since you need to > > emit the right type of record, which might mean instantiating a new one. > I > > think that means we either need 2 methods, one for SourceRecord, one f

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-28 Thread Shikhar Bhushan
> > > Hmm, operating on ConnectRecords probably doesn't work since you need to > emit the right type of record, which might mean instantiating a new one. I > think that means we either need 2 methods, one for SourceRecord, one for > SinkRecord, or we'd need to limit what parts of the message you ca

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-28 Thread Ewen Cheslack-Postava
On Thu, Jul 28, 2016 at 1:13 PM, Shikhar Bhushan wrote: > Some thoughts on the KIP and single-message transforms in general. > > * When does transformation take place? In the KIP, it seems like the > connector-implemented task is responsible for calling into the > transformation logic. I'd propos

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-28 Thread Shikhar Bhushan
Some thoughts on the KIP and single-message transforms in general. * When does transformation take place? In the KIP, it seems like the connector-implemented task is responsible for calling into the transformation logic. I'd propose that, - for source connectors, the transformer chain operates o

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-25 Thread Michael Noll
API question regarding dead letter queues / sending messages to a null topic in order to get rid of them: I assume we wouldn't suggest users to actually pass `null` into some method, but rather have a proper and descriptive API method such as `discard()` (this name is just an example)? On Sat, Ju

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-23 Thread Ewen Cheslack-Postava
On Fri, Jul 22, 2016 at 12:58 AM, Shikhar Bhushan wrote: > flatMap() / supporting 1->n feels nice and general since filtering is just > the case of going from 1->0 > > I'm not sure why we'd need to do any more granular offset tracking (like > sub-offsets) for source connectors: after transformati

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-22 Thread Shikhar Bhushan
flatMap() / supporting 1->n feels nice and general since filtering is just the case of going from 1->0 I'm not sure why we'd need to do any more granular offset tracking (like sub-offsets) for source connectors: after transformation of a given record to n records, all of those n should map to same

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-21 Thread Ewen Cheslack-Postava
Jun, The problem with it not being 1-1 is that Connect relies heavily on offsets, so we'd need to be able to track offsets at this finer granularity. Filtering is ok, but flatMap isn't. If you convert one message to many, what are the offsets for the new messages? One possibility would be to assume

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-17 Thread Jun Rao
Does the transformation need to be 1-to-1? For example, some users model each Kafka message as schema + a batch of binary records. When using a sink connector to push the Kafka data to a sink, if would be useful if the transformer can convert each Kafka message to multiple records. Thanks, Jun O

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-16 Thread Nisarg Shah
Gwen, Yup, that sounds great! Instead of keeping it up to the transformers to handle null, we can instead have the topic as null. Sounds good. To get rid of a message, set the topic to a special one (could be as simple as null). Like I said before, the more interesting part would be ‘adding’

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-14 Thread Gwen Shapira
I used to work on Apache Flume, where we used to allow users to filter messages completely in the transformation and then we got rid of it, because we spent too much time trying to help users who had "message loss", where the loss was actually a bug in the filter... What we couldn't do in Flume, b

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-14 Thread Nisarg Shah
Thank you for your inputs Gwen and Michael. The original reason why I suggested a set based processing is because of the flexibility is provides. The JIRA had a comment by a user requesting a feature that could be achieved with this. After reading Gwen and Michael's points, (I went through the

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-12 Thread Michael Noll
As Gwen said, my initial thought is that message transformations that are "more than trivial" should rather be done by Kafka Streams, rather than by Kafka Connect (for the reasons that Gwen mentioned). Transforming one message at a time would be a good fit for Kafka Connect. An important use case

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-11 Thread Gwen Shapira
I think we need to restrict the functionality to one-message-at-a-time. Basically, connect gives very little guarantees about the size of the set of the composition (you may get same messages over and over, mix of old and new, etc) In order to do useful things over a collection, you need better d

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-11 Thread Nisarg Shah
Thanks Jay, added that to the KIP. Besides reviewing the KIP as a whole, I wanted to know about what everyone thinks about what data should be dealt at the Transformer level. Transform the whole Collection of Records (giving the flexibility of modifying messages across the set) OR Transform me

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-11 Thread Jay Kreps
One minor thing, the Transformer interface probably needs a close() method (i.e. the opposite of initialize). This would be used for any transformer that uses a resource like a file/socket/db connection/etc that needs to be closed. You usually don't need this but when you do need it you really need

[DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-11 Thread Nisarg Shah
Hello, This KIP is for KAFKA-3209 . It’s about capabilities to transform messages in Kafka Connect. Some design deci