Re: [DISCUSS] KIP-585: Conditional SMT

Christopher Egerton Wed, 01 Apr 2020 13:43:16 -0700

Hi Tom,

This looks great and I'd love to see the out-of-the-box SMTs become even
more powerful with the improvements you've proposed! Got a few remarks and
would be interested in your thoughts:


1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
what would you think about adopting a similar approach as the framework
uses with connectors, by adding a "Config validate(Map<String, String>
props)" method that can perform custom validation outside of what can be
performed by the ConfigDef's single-property-at-a-time validation? It may
be a little heavyweight for use with this particular SMT, but it'd provide
more flexibility for other SMT implementations and would mirror an API that
developers targeting the framework are likely already familiar with.
2. The possibility for adding the logical operators "and" and "or" is
mentioned, but only as a potential future change and not one proposed by
this KIP. Why not include those operators sooner rather than later?
3. The syntax for named conditions that are then referenced in logical
operators is... tricky. It took me a few attempts to grok the example
provided in the KIP after reading Sönke's question about the example for
negation. What about a more sophisticated but less verbose syntax that
supports a single configuration for the condition, even with logical
operators? I'm thinking something like
"transforms.conditionalExtract.condition: not(has-header:my-header)"
instead of the "transforms.conditionalExtract.condition: not:hasMyHeader"
and "transforms.conditionalExtract.condition.hasMyHeader:
has-header:my-header" properties. If support for a logical "and" is added,
this could then be expanded to something like
"transforms.conditionalExtract.condition: and(has-header(my-header),
not(topic-matches(my-prefix-.*)))". There would be additional complexity
here with the need to escape parentheses and commas that are intended to be
treated literally (as part of a header name, for example) instead of as
part of the syntax for the condition itself, but a little additional
complexity for edge cases like that may be warranted if it heavily reduces
complexity for the common cases. The rationale for the proposed
parentheses-based syntax here instead of what's mentioned in the KIP
(something like "and: <condition1>, <condition2>") is to help with
readability; we probably wouldn't need that with the approach of naming
conditions via separate properties, but things may get a little nasty with
literal conditions included there, especially with the possibility of
nesting. Finally, the shift in syntax here should make it easier to handle
cases like the header value matching brought up by Sönke; it might look
something like "header-matches(header-name, header-value-pattern)".

Cheers,

Chris

On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tbent...@redhat.com> wrote:

> Hi Sönke,
>
> Thanks for taking a look.
>
> Let me answer in reverse order, since I think it might make more sense that
> way...
>
> Also, while writing that, I noticed that you have a version with and
> > without "name" for your transformation in the KIP:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> > and
> > transforms.conditionalExtract.condition: has-header:my-header
> >
>
> The example
>     transforms.conditionalExtract.condition: has-header:my-header
> is a "has-header" condition (the prefix of the config value), which will
> match records with a "my-header" header (given in the suffix).
>
> The other example given is:
>     transforms.conditionalExtract.condition: not:hasMyHeader
>     transforms.conditionalExtract.condition.hasMyHeader:
> has-header:my-header
> The root of the condition is a "not" condition (the prefix of the value for
> the transforms.conditionalExtract.condition key) of another named condition
> called "hasMyHeader" (the suffix). Any name could be used for the other
> condition. That other condition is configured at
> "transforms.conditionalExtract.condition.<conditionName>". That condition
> is a "has-header" condition (the prefix), which will match records with a
> "my-header" header (given in the suffix). So the "has-header:" condition
> type would always require a suffix, as would the "not:" condition type.
> Hypothetically you could have a "true" condition type (which would not
> require a suffix), and the hypothetical binary conditions "and:" and "or:"
> would require a pair of other condition names.
>
> So what's proposed is a scheme for encoding conditions where the condition
> type is the prefix of the value of some "....condition" config key, and the
> optional suffix provides parameters for the condition. This makes those
> parameters a bit inflexible, but is relatively succinct.
>
> This leads on to your first point. You're right that use cases might appear
> which need other conditions, and we should make it flexible enough to be
> able to cope with future use cases. On the other hand, I was concerned that
> we end up with something which is quite complicated to configure. (There
> comes a point where it might makes more sense for the user to write their
> own SMT).
>
> Just of the top of my head it might look like:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > transforms.conditionalExtract.condition.hasMyHeader:
> header-name:my-header
> > transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
> >
>
> That won't work because the format is basically a Properties/Map<String,
> String> and what you've suggested has duplicate keys.
>
> One thing I did briefly consider what the ability to treat conditions as
> Configurable objects in their own right (opening up the possibility of
> people supplying their own Conditions, just like they can supply their own
> SMTs). That might be configured something like this:
>
>     transforms.conditionalExtract.condition: not
>     transforms.conditionalExtract.condition.not.type: Not
>     transforms.conditionalExtract.condition.not.negated: foo
>     transforms.conditionalExtract.condition.foo.type: HasHeaderWithValue
>     transforms.conditionalExtract.condition.foo.header: my-header
>     transforms.conditionalExtract.condition.foo.value: my-value
>
> I didn't propose that I couldn't see the use cases to justify this kind of
> complexity, especially as the common case would surely be matching against
> topic name (to be honest I wasn't completely convinced by the need for
> "has-header"). In the current version of the KIP that's just
>
>     transforms.conditionalExtract.condition: topic-matches: my-prefix-.*
>
> but using the more flexible scheme that would require something more like
> this:
>
>     transforms.conditionalExtract.condition: bar
>     transforms.conditionalExtract.condition.bar.type: TopicMatch
>     transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
>
> If people know of use cases which would justify more sophistication, I'm
> happy to reconsider.
>
> Thanks again for taking a look!
>
> Tom
>
> On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> <soenke.lie...@opencore.com.invalid> wrote:
>
> > Hi Tom,
> >
> > sounds useful to me, thanks for the KIP!
> > The only thought that I had while reading was that this will probably
> raise
> > questions about more involved conditions fairly quickly. For example the
> > "has-header" will cause an appetite for conditions like
> > "this-header-has-that-value".
> > This would necessitate two parameters to be passed into the condition,
> > which I think is not currently included in the KIP. I am not saying add
> > this now, but might it make sense to discuss a concept of how that might
> > look now, to avoid potential changes later on.
> >
> > Just of the top of my head it might look like:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > transforms.conditionalExtract.condition.hasMyHeader:
> header-name:my-header
> > transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
> >
> >
> > Also, while writing that, I noticed that you have a version with and
> > without "name" for your transformation in the KIP:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> > and
> > transforms.conditionalExtract.condition: has-header:my-header
> >
> >
> > Is this intentional and the name is optional?
> >
> >
> > Best regards,
> >
> > Sönke
> >
> >
> >
> > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tbent...@redhat.com> wrote:
> > >
> > > Does anyone have any comments, feedback or thoughts about this?
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <tbent...@redhat.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > > > feedback:
> > > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > >
> > > > Many thanks,
> > > >
> > > > Tom
> > > >
> >
>

Re: [DISCUSS] KIP-585: Conditional SMT

Reply via email to