Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

Shikhar Bhushan Thu, 28 Jul 2016 23:59:27 -0700

>
>
> Hmm, operating on ConnectRecords probably doesn't work since you need to
> emit the right type of record, which might mean instantiating a new one. I
> think that means we either need 2 methods, one for SourceRecord, one for
> SinkRecord, or we'd need to limit what parts of the message you can modify
> (e.g. you can change the key/value via something like
> transformKey(ConnectRecord) and transformValue(ConnectRecord), but other
> fields would remain the same and the fmwk would handle allocating new
> Source/SinkRecords if needed)
>


Good point, perhaps we could add an abstract method on ConnectRecord that
takes all the shared fields as parameters and the implementations return a
copy of the narrower SourceRecord/SinkRecord type as appropriate.
Transformers would only operate on ConnectRecord rather than caring about
SourceRecord or SinkRecord (in theory they could instanceof/cast, but the
API should discourage it)


> Is there a use case for hanging on to the original? I can't think of a
> transformation where you'd need to do that (or couldn't just order things
> differently so it isn't a problem).


Yeah maybe this isn't really necessary. No strong preference here.

That said, I do worry a bit that farming too much stuff out to transformers
> can result in "programming via config", i.e. a lot of the simplicity you
> get from Connect disappears in long config files. Standardization would be
> nice and might just avoid this (and doesn't cost that much implementing it
> in each connector), and I'd personally prefer something a bit less flexible
> but consistent and easy to configure.


Not sure what the you're suggesting :-) Standardized config properties for
a small set of transformations, leaving it upto connectors to integrate?

Personally I'm skeptical of that level of flexibility in transformers --
> its getting awfully complex and certainly takes us pretty far from "config
> only" realtime data integration. It's not clear to me what the use cases
> are that aren't covered by a small set of common transformations that can
> be chained together (e.g. rename/remove fields, mask values, and maybe a
> couple more).
>

I agree that we should have some standard transformations that we ship with
connect that users would ideally lean towards for routine tasks. The ones
you mention are some good candidates where I'd imagine can expose simple
config, e.g.
   transform.filter.whitelist=x,y,z # filter to a whitelist of fields
   transfom.rename.spec=oldName1=>newName1, oldName2=>newName2
   topic.rename.replace=-/_
   topic.rename.prefix=kafka_
etc..

However the ecosystem will invariably have more complex transformers if we
make this pluggable. And because ETL is messy, that's probably a good thing
if folks are able to do their data munging orthogonally to connectors, so
that connectors can focus on the logic of how data should be copied from/to
datastores and Kafka.


> In any case, we'd probably also have to change configs of connectors if we
> allowed configs like that since presumably transformer configs will just be
> part of the connector config.
>

Yeah, haven't thought much about how all the configuration would tie
together...

I think we'd need the ability to:
- spec transformer chain (fully-qualified class names? perhaps special
aliases for built-in ones? perhaps third-party fqcns can be assigned
aliases by users in the chain spec, for easier configuration and to
uniquely identify a transformation when it occurs more than one time in a
chain?)
- configure each transformer -- all properties prefixed with that
transformer's ID (fqcn / alias) get destined to it

Additionally, I think we would probably want to allow for topic-specific
overrides <https://issues.apache.org/jira/browse/KAFKA-3962> (e.g. you want
certain transformations for one topic, but different ones for another...)

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

Reply via email to