Hi Konstantine, Thanks very much for your input. It's helpful to have a steer from a committer on the syntax point especially. I've made all those changes and added a note about compatibility.
Cheers, Tom On Thu, May 14, 2020 at 6:00 AM Konstantine Karantasis < konstant...@confluent.io> wrote: > Hi Tom. > > Thanks for the KIP. I like how the proposal has ended up to be and I think > it describes a practical approach. > > I have to say that, for a moment, earlier in the discussion I thought we > were leaning a bit towards an unconventional mini assembly language based > on java properties. > The reliance on java properties for SMT definition and configuration has > drawn some criticism in the past, so keeping the model lean and appealing > as well as expressive is worth the effort in my opinion. > > With that in mind, I feel that we could move forward without the question > mark notation. That approach would be similar to what was introduced when > SMTs were implemented by adding the `transforms` prefix. We'll have a top > level property called `predicates` and a namespace starting with the > `predicates.` prefix followed by aliases. Avoiding the '?' in the > transforms seems also ok to me in order to keep things simple and > intuitive. > > Besides these suggestions, I have a few additional minor comments: > > 1. Should we extend the `AutoCloseable` interface instead of `Closeable` > since we ignore the Exception in the signature anyways? I know this will > make Predicate different from Transformation but it also corresponds better > to the intended use. > 2. As Andrew mentioned, Predicate needs to be declared an interface in the > KIP. > 3. Currently examples are neither java properties nor json. Should we > comply with one of the two formats to avoid confusing users that will read > the KIP before trying this feature? > > Thanks again for driving this proposal. > > Konstantine > > On Mon, May 11, 2020 at 7:57 AM Tom Bentley <tbent...@redhat.com> wrote: > > > Hi Andrew, > > > > That works nicely enough for the proposal where the predicate is > configured > > directly on the transformation. But I thought there was more consensus > > around the proposal to have the transformation configuration refer to a > > predicate indirectly, defined with the ?predicates key. I guess an > example > > would look like this: > > > > transforms: t2 > > transforms.t2?predicate: has-prefix > > transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key > > transforms.t2.field: c1 > > ?predicates: has-prefix > > ?predicates.has-prefix.type: > > org.apache.kafka.connect.transforms.predicates.TopicNameMatch > > ?predicates.has-prefix.negate: true > > ?predicates.has-prefix.pattern: my-prefix-.* > > > > I agree that this seems to reduce the chance of conflict to practically > > nothing. I'm happy enough to go with this and I've updated the KIP > > accordingly. > > > > Assuming no one raises more concerns I'll start a vote soon. > > > > Kind regards, > > > > Tom > > > > On Mon, May 11, 2020 at 10:54 AM Andrew Schofield < > > andrew_schofi...@live.com> > > wrote: > > > > > Hi, > > > I have implemented some of this and configured some predicates and I > > > prefer this slight modification of the ? syntax: > > > > > > transforms: t2 > > > transforms.t2?type: > > > org.apache.kafka.connect.transforms.predicates.TopicNameMatch > > > transforms.t2?negate: true > > > transforms.t2?pattern: my-prefix-.* > > > transforms.t2.type: > org.apache.kafka.connect.transforms.ExtractField$Key > > > transforms.t2.field: c1 > > > > > > So, "transforms.<alias>." introduces the config for the transformation, > > > while "transforms.<alias>?" introduces the > > > config for the predicate. The risk of a clash now is that someone has > > used > > > a ? in the alias name, which is unlikely. > > > > > > Also, there's a slight typo in the Java definition of the Predicate > > > interface. It should be: > > > > > > public interface Predicate<R extends ConnectRecord<R>> extends > > > Configurable, Closeable > > > > > > Looks like discussion has died down so I wonder whether it's time to > > begin > > > a vote. > > > > > > Cheers, > > > Andrew > > > > > > On 06/05/2020, 08:57, "Andrew Schofield" <andrew_schofi...@live.com> > > > wrote: > > > > > > > Hi Tom, > > > > I think that either proposal is sufficiently readable and they > both > > > have elements of > > > > cryptic syntax. I remember talking to an engineering director once > > > who said he didn't hire > > > > "curly braces people" any more. This is an interface for curly > > braces > > > people whichever > > > > way we go here. > > > > > > > I slightly prefer the predicates as a top-level concept rather > than > > > as a feature > > > > of the If transform. I also don't like the namespacing behaviour > of > > > the If transform which > > > > I guess works like variable declaration so each scope has its own > > > namespace of transform > > > > aliases. I expect the following is valid but ill-advised. > > > > > > > transforms: t1 > > > > transforms.t1.type: If > > > > transforms.t1.!test.type: TopicNameMatch > > > > transforms.t1.!test.pattern: my-prefix-.* > > > > transforms.t1.then: t1 > > > > transforms.t1.then.t1.type: ExtractField$Key > > > > transforms.t1.then.t1.field: c1 > > > > > > > I would love to see conditional application of transforms in SMT > in > > > Kafka 2.6. > > > > Let's see what other people think and get a consensus view on how > to > > > spell it. > > > > > > > Cheers, > > > > Andrew > > > > > > > On 05/05/2020, 15:20, "Tom Bentley" <tbent...@redhat.com> wrote: > > > > > > > Hi, > > > > > > > Again, reusing the existing numbering... > > > > > > > 3) I had exactly the same thought. The reason I didn't already > > > rename it > > > > was because : > > > > > > > * Filter is not exactly wrong (by default it just filters out > > all > > > the > > > > messages) > > > > * Filter is a useful hint about how it's intended to be used. > > > > > > > But I don't have a strong opinion, so I'm happy to rename it > if > > > people > > > > think that Drop is better. > > > > > > > 4) Putting the negation on the use site–rather than the > > > declaration > > > > site–seems like a good idea, since it makes it easier to get > > > 'else'-like > > > > behaviour. > > > > > > > But, playing devil's advocate, I wondered how all this > compares > > > with the > > > > original proposal, so I went to the trouble of writing out a > few > > > examples > > > > to compare the two > > > > > > > Example of a basic 'if' > > > > Current proposal: > > > > transforms: t2 > > > > transforms.t2.?predicate: !has-my-prefix > > > > transforms.t2.type: ExtractField$Key > > > > transforms.t2.field: c1 > > > > ?predicates: has-my-prefix > > > > ?predicates.has-my-prefix.type: TopicNameMatch > > > > ?predicates.has-my-prefix.pattern: my-prefix-.* > > > > Original (or originalish, see notes below): > > > > transforms: t1 > > > > transforms.t1.type: If > > > > transforms.t1.!test.type: TopicNameMatch > > > > transforms.t1.!test.pattern: my-prefix-.* > > > > transforms.t1.then: t2 > > > > transforms.t1.then.t2.type: ExtractField$Key > > > > transforms.t1.then.t2.field: c1 > > > > > > > Example of 'if/else': > > > > Current: > > > > transforms: t2,t3 > > > > transforms.t2.?predicate: !has-my-prefix > > > > transforms.t2.type: ExtractField$Key > > > > transforms.t2.field: c1 > > > > transforms.t3.?predicate: has-my-prefix > > > > transforms.t3.type: ExtractField$Key > > > > transforms.t3.field: c2 > > > > ?predicates: has-my-prefix > > > > ?predicates.has-my-prefix.type: TopicNameMatch > > > > ?predicates.has-my-prefix.pattern: my-prefix-.* > > > > Original: > > > > transforms: t1 > > > > transforms.t1.type: If > > > > transforms.t1.!test.type: TopicNameMatch > > > > transforms.t1.!test.pattern: my-prefix-.* > > > > transforms.t1.then: t2 > > > > transforms.t1.then.t2.type: ExtractField$Key > > > > transforms.t1.then.t2.field: c1 > > > > transforms.t1.else: t3 > > > > transforms.t1.else.t3.type: ExtractField$Key > > > > transforms.t1.else.t3.field: c2 > > > > > > > Example of guarding a >1 SMT > > > > Current: > > > > transforms: t2,t3 > > > > transforms.t2.?predicate: !has-my-prefix > > > > transforms.t2.type: ExtractField$Key > > > > transforms.t2.field: c1 > > > > transforms.t3.?predicate: !has-my-prefix > > > > transforms.t3.type: ExtractField$Value > > > > transforms.t3.field: c2 > > > > ?predicates: has-my-prefix > > > > ?predicates.has-my-prefix.type: TopicNameMatch > > > > ?predicates.has-my-prefix.pattern: my-prefix-.* > > > > Original: > > > > transforms: t1 > > > > transforms.t1.type: If > > > > transforms.t1.!test.type: TopicNameMatch > > > > transforms.t1.!test.pattern: my-prefix-.* > > > > transforms.t1.then: t2,t3 > > > > transforms.t1.then.t2.type: ExtractField$Key > > > > transforms.t1.then.t2.field: c1 > > > > transforms.t1.then.t3.type: ExtractField$Value > > > > transforms.t1.then.t3.field: c2 > > > > > > > Notes: > > > > * I'm assuming we would negate predicates in the Current > > proposal > > > by > > > > prefixing the predicate name with ! > > > > * I'm using "test" rather than "predicate" in the Original > > > because it looks > > > > a bit nicer having the same number of letters as "then" and > > > "else". > > > > * I'm negating the test in the Original by using "!test" > rather > > > than "test". > > > > > > > Those notes aside, I'd make the following assertions: > > > > > > > 1. The Original examples require the same number of keys (or > one > > > less line > > > > in the last example). > > > > 2. The Original examples are more guessable if you'd never > seen > > > this KIP > > > > before. The SMT name (If) and its config parameter names > ('test' > > > and 'then' > > > > and maybe also 'else') all give quite strong clues about > what's > > > happening. > > > > I'd also argue that using !test is less cryptic than > > > '?predicate', since > > > > the purpose of the '!' fits with its use in most programming > > > languages. The > > > > purpose of the '?' in '?predicate' is to reduce the chance of > a > > > naming > > > > collision, which is not something which people could guess. > > > > 3. The Original examples cannot have config parameters naming > > > conflicts > > > > with either Connectors or Transformations. > > > > > > > Although I don't think it's a strong argument I would also > add: > > > > > > > 4. Because the original syntax supports nesting of 'If' SMTs > > it's > > > actually > > > > more powerful. > > > > > > > So, I know people really didn't like the original suggestion, > > but > > > if you > > > > accept these assertions then the current proposal is worse. > Can > > > anyone > > > > point to some objective ways in which the current proposal is > > > better than > > > > the original one? > > > > > > > If people really think the current proposal is better then I'm > > > happy to go > > > > with that, call a vote and see if it has any traction with > > people > > > outside > > > > this discussion. Ultimately I just want to get support for > this > > > feature > > > > into Kafka, however it's configured. > > > > > > > Kind regards, > > > > > > > Tom > > > > > > > On Mon, May 4, 2020 at 12:35 PM Andrew Schofield < > > > andrew_schofi...@live.com> > > > > wrote: > > > > > > > > Hi Tom, > > > > > Referring back to Chris's points. > > > > > > > > > > 1) Exception handling - yes, agreed. This KIP can just say > the > > > exception > > > > > handling for > > > > > predicates is the same as transformations. > > > > > > > > > > 2) Worker plugins - agree too. > > > > > > > > > > 3) Filter SMT - If the Filter SMT is just becoming a > > > transformation that > > > > > drops any records that > > > > > make it through the predicate, it's probably actually a Drop > > > > > transformation. If it's > > > > > not guarded with a predicate, it drops every record, but > with > > a > > > predicate > > > > > it just > > > > > drops records that match. > > > > > > > > > > 4) Syntax - I quite like the idea of a separate namespace > for > > > predicates, > > > > > just like for transformations. > > > > > I particularly like how you could have 2 transformations > > > referencing the > > > > > same predicate and config. > > > > > > > > > > There is the scope for collision with connector > configuration, > > > but my > > > > > hunch is that > > > > > "predicates" is less risky that simply "predicate" since it > > > would only > > > > > clash with a connector > > > > > that happened to have multiple predicates. > > > > > > > > > > But I think there is a non-zero chance that an existing > > > transformation > > > > > might have used > > > > > "predicate" as a config. I have actually written a filter > > > transformation > > > > > myself. It didn't use > > > > > "predicate" but it almost did. That's why I suggested > > > "?predicate" which I > > > > > know is a bit > > > > > cryptic, but it's also not going to clash. > > > > > > > > > > Here are two serving suggestions: > > > > > > > > > > A) "?predicate" to name the predicate alias to use > > > > > transforms: t2 > > > > > transforms.t2.?predicate: has-my-prefix > > > > > transforms.t2.type:ExtractField$Key > > > > > transforms.t2.field: c1 > > > > > > > > > > B) "?" to name the predicate alias to use > > > > > transforms: t2 > > > > > transforms.t2.?: has-my-prefix > > > > > transforms.t2.type:ExtractField$Key > > > > > transforms.t2.field: c1 > > > > > > > > > > > > > > > One more question. > > > > > > > > > > Do people think the negation ought to be in the predicate > > > config or in the > > > > > referring transformation > > > > > config? We might want to make: > > > > > > > > > > predicate(a)=>transform(T1) > > > > > predicate(negate(a))=>transform(T2) > > > > > > > > > > easier than having to use two separate predicate aliases > with > > > identical > > > > > configuration, except for > > > > > the negation. > > > > > > > > > > Thanks, > > > > > Andrew > > > > > > > > > > On 04/05/2020, 10:22, "Tom Bentley" <tbent...@redhat.com> > > > wrote: > > > > > > > > > > Hi Chris, > > > > > > > > > > Thanks for the feedback. On points 1 and 2 I agree. > > > > > > > > > > 3. This is a great point. I'm happy to add this to the > KIP > > > too, but > > > > > note it > > > > > removes the need to add validate(Map<String, String>) to > > > both > > > > > Transformation and Predicate. That doesn't mean we > > > shouldn't still do > > > > > it, > > > > > just that the change is not motivated by the features > > being > > > added in > > > > > the > > > > > KIP. So I've removed validate(Map<String, String>) for > > now. > > > > > > > > > > 4. Your proposal would further reduce the chance of > > > Transformation > > > > > config > > > > > conflict (because just the 'predicate' parameter might > > > conflict), but > > > > > it > > > > > also adds the chance of conflict with Connector > parameters > > > (all the > > > > > 'predicates.*' parameters). Connector implementations > are > > > far more > > > > > numerous > > > > > than Transformation implementations, so the risk is > > > proportionately > > > > > greater. As you say we could try to reduce the chances > of > > > conflict > > > > > (e.g. > > > > > '?predicate' parameter on transformations and > > '?predicates' > > > prefix on > > > > > connectors), but I'm still left with uneasy feelings. > I'd > > > like to hear > > > > > about what others think before I change this one. > > > > > > > > > > Cheers, > > > > > > > > > > Tom > > > > > > > > > > > > > > > On Mon, May 4, 2020 at 4:57 AM Christopher Egerton < > > > > > chr...@confluent.io> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > This has certainly taken quite a turn! I wish we could > > > get this done > > > > > > without adding another pluggable interface but if the > > > benefit is > > > > > that now > > > > > > any SMT--new or pre-existing--can be conditionally > > > applied, it seems > > > > > like > > > > > > it might be worth it. > > > > > > > > > > > > As far as the proposed design goes, some thoughts: > > > > > > > > > > > > 1. There isn't a clear reason why exceptions thrown > from > > > predicates > > > > > should > > > > > > be treated differently from exceptions thrown by > > > transformations. We > > > > > have > > > > > > reasonable error-tolerance mechanisms in the Connect > > > framework; why > > > > > > silently swallow errors, especially if the risk is > > > publishing > > > > > potentially > > > > > > corrupted data to Kafka or an external system because > an > > > SMT that > > > > > should > > > > > > have been applied wasn't? > > > > > > > > > > > > 2. It might be worth noting that the interface is a > > > worker plugin, > > > > > and that > > > > > > presumably it would be loaded in the same way as other > > > worker > > > > > plugins such > > > > > > as converters, connectors, and REST extensions. This > > > would include > > > > > aliasing > > > > > > behavior that would allow users to specify predicates > > > using their > > > > > simple > > > > > > class names as long as no two predicate plugins with > the > > > same simple > > > > > name > > > > > > were available on the worker. > > > > > > > > > > > > 3. Why would the Filter SMT explicitly take in a > > > predicate in its > > > > > > configuration if predicates can be applied to all > SMTs? > > > Just reading > > > > > the > > > > > > idea for a filter SMT, it seemed like the behavior > would > > > be that the > > > > > SMT > > > > > > would take in no configuration parameters and just > > > blindly drop > > > > > everything > > > > > > that it sees, but typical use would involve pairing it > > > with a > > > > > predicate. > > > > > > > > > > > > 4. The question mark syntax seems a little cryptic, > and > > > combining the > > > > > > configuration properties for the predicate and the > > > transformation in > > > > > the > > > > > > same namespace ("transforms.<name>.*) seems a little > > > noisy. What do > > > > > you > > > > > > think about allowing predicates to be configured in > > their > > > own > > > > > namespace, > > > > > > and referenced by name in a single "predicate" (or > maybe > > > > > "?predicate" if we > > > > > > really want to avoid risking backwards compatibility > > > concerns) > > > > > property? > > > > > > This would also enable them to be more easily reused > > > across several > > > > > SMTs. > > > > > > > > > > > > For example, you might configure a worker with these > > > properties > > > > > (assuming > > > > > > plugin aliasing is supported for predicates in the > same > > > way that it > > > > > is for > > > > > > transformations): > > > > > > > > > > > > transforms: t2 > > > > > > transforms.t2.predicate: has-my-prefix > > > > > > transforms.t2.type:ExtractField$Key > > > > > > transforms.t2.field: c1 > > > > > > > > > > > > predicates: has-my-prefix > > > > > > predicates.has-my-prefix.type: TopicNameMatch > > > > > > predicates.has-my-prefix.negate: true > > > > > > predicates.has-my-prefix.pattern: my-prefix-.* > > > > > > > > > > > > Excited to see how this is evolving and I think we're > > > heading towards > > > > > > something really useful for the Connect framework. > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Chris > > > > > > > > > > > > > > > > > > On Fri, May 1, 2020 at 9:51 AM Andrew Schofield < > > > > > andrew_schofi...@live.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Tom, > > > > > > > Looks good to me. > > > > > > > > > > > > > > Thanks, > > > > > > > Andrew > > > > > > > > > > > > > > On 01/05/2020, 16:24, "Tom Bentley" < > > > tbent...@redhat.com> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I've updated the KIP to reflect recent > > discussions. > > > Please > > > > > note the > > > > > > > following: > > > > > > > > > > > > > > 1. I've described a HasHeaderKey predicate > rather > > > than > > > > > > HeaderKeyMatches > > > > > > > because at the moment Headers doesn't expose the > > > whole set of > > > > > > headers, > > > > > > > only > > > > > > > those with a specific name. Exposing this would > > > significantly > > > > > > increase > > > > > > > the > > > > > > > scope of this KIP but relatively little extra > > > benefit. > > > > > > > 2. I used > > > org.apache.kafka.connect.transformer.predicates > > > > > rather than > > > > > > > org.apache.kafka.connect.predicates, since I > > > thought it better > > > > > > > reflected > > > > > > > the use of predicates within transforms. I'm > > > flexible on this > > > > > one if > > > > > > > someone has a good case for the original name. > For > > > example the > > > > > > original > > > > > > > package name would be more appropriate if we > were > > > expecting > > > > > > Connectors > > > > > > > to > > > > > > > make use of Predicates somehow. > > > > > > > 3. I've dropped the special case of not needing > to > > > provide a > > > > > FQN for > > > > > > > predicate classes in that package, since this > > isn't > > > something > > > > > > > supported by > > > > > > > transformations themselves as far as I know. I > > like > > > the idea, > > > > > but it > > > > > > > seemed > > > > > > > inconsistent to need it for transformations but > > not > > > for > > > > > predicates. > > > > > > > 4. Chris, I've used your suggestions for a > > > validate() method > > > > > for the > > > > > > > extra > > > > > > > configurability needed. I've also added this to > > > Predicate (even > > > > > > though > > > > > > > I > > > > > > > don't need it) for consistency with Connector > and > > > > > Transformation. > > > > > > > 5. For negating the result of a predicate I > > decided > > > it was > > > > > clearer to > > > > > > > just > > > > > > > have a "negate" config parameter, supported by > all > > > predicates, > > > > > than > > > > > > to > > > > > > > reach for more punctuation. > > > > > > > > > > > > > > I'd be grateful for any more feedback people > might > > > have. > > > > > > > > > > > > > > Kind regards, > > > > > > > > > > > > > > Tom > > > > > > > > > > > > > > On Tue, Apr 28, 2020 at 3:50 PM Andrew > Schofield < > > > > > > > andrew_schofi...@live.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Tom, > > > > > > > > I think we should go for a consistent naming > > > convention for > > > > > the > > > > > > > > predicates, maybe so they > > > > > > > > read nicely if you say "IF" first. Here's a > > > suggestion for > > > > > > > predicates that > > > > > > > > the KIP could introduce: > > > > > > > > - TopicNameMatches > > > > > > > > - HeaderKeyMatches > > > > > > > > - RecordIsTombstone > > > > > > > > > > > > > > > > On naming, I also suggest using > > > > > > > > org.apache.kafka.connect.predicates.Predicate > > > > > > > > as the name for the interface. > > > > > > > > > > > > > > > > I hadn't settled on a preference for > negation. I > > > suppose > > > > > there are > > > > > > > three > > > > > > > > results > > > > > > > > from evaluating a predicate - true, false or > an > > > exception > > > > > caught by > > > > > > > Kafka > > > > > > > > Connect. > > > > > > > > I suggest the exception case is always a > > > predicate fail so > > > > > the > > > > > > > guarded > > > > > > > > transformation > > > > > > > > is skipped. Initially I'd preferred having the > > > predicate be > > > > > > > configured to > > > > > > > > return a > > > > > > > > negated result to give a "not match" > behaviour. > > > Thinking > > > > > again > > > > > > now, I > > > > > > > > prefer something like: > > > > > > > > > > > > > > > > transforms: t1 > > > > > > > > transforms.t1.?type: RecordIsTombstone > > > > > > > > transforms.t1.type: > > > > > > > > org.apache.kafka.connect.transforms.InsertField$Value > > > > > > > > transforms.t1.static.field: deleted > > > > > > > > transforms.t1.static.value: true > > > > > > > > > > > > > > > > The "?" means the transform guarded by the > > > predicate runs in > > > > > the > > > > > > > Predicate > > > > > > > > returns true. > > > > > > > > > > > > > > > > transforms: t2 > > > > > > > > transforms.t2.?!type: > > > > > > > org.apache.kafka.connect.predicates.TopicNameMatch > > > > > > > > transforms.t2.?regex: my-prefix-.* > > > > > > > > transforms.t2.type: > > > > > > > org.apache.kafka.connect.transforms.ExtractField$Key > > > > > > > > transforms.t2.field: c1 > > > > > > > > > > > > > > > > The "?!" means the transform guarded by the > > > predicate runs > > > > > if the > > > > > > > Predicate > > > > > > > > returns false. > > > > > > > > > > > > > > > > I think adding a Filter SMT that uses a > > Predicate > > > is a nice > > > > > idea. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >