Thanks for the contribution, Joshua. Overall I think this is a really good improvement to this connector, for all the reasons cited in the motivation section. I do have some requests, though.
The biggest issue I have is that the change is not backward compatible. We've worked really hard to make sure that all Connect APIs (including configuration properties) are always backward compatible because this makes it trivial for users to upgrade from one Connect release to a newer one. After all, any users that are already using this SMT in their connectors should be able to upgrade Connect and still have the SMT behave exactly as it did in earlier releases -- unless they opt in to changing their configuration to use the new feature. So I would request that the KIP and proposed implementation be changed to not *change* existing configuration properties. We can of course introduce new config properties, but they should have a default that results in exactly the same old behavior as in earlier releases. A third option is to create a new SMT, though we should try really hard to just improve the existing SMT since that's generally a much better UX. Assuming we can make the existing transformation work, the option to create a new SMT should be documented in the Rejected Alternatives section, too. For example, there are (at least?) two options with regard to specifying the field names. 1) just keep `field` as the name even though a comma-separated list is allowed, or 2) add `fields` but keep (and deprecate?) the existing `field` configuration, using `fields` if defined or if `field` is not defined, and maybe even supporting comma separated lists of names in both fields. The same goes for `format` and `format.output`. In general, I think 2 *might* provide a better experience at the cost of higher implementation complexity. Regardless of which you choose, the KIP should explicitly propose one approach, and document the other(s) in the rejected alternatives section. After all, the KIP should be very clear about what will be proposed. In terms of deprecating the old configs in option 2, it'd be fine to do this in the documentation (and code, if applicable) as part of this feature. That means they could be removed at the earliest in the next major release after the feature is merged, which would be AK 4.0. And it would be good to mention this explicitly in the KIP so that we don't need another KIP just to remove the deprecation; so something like "This configuration property will be deprecated and documented as such, and eventually removed in a subsequent major release." Supporting nested fields would also be a nice touch here, too. However, as you mention, we'd likely want to make the same change to other SMTs, and as such we might consider adding support for nested names on all SMTs with a separate KIP. The two KIPs could be written such that they are independent. Finally, a KIP need not go into much detail about the implementation, and should instead focus on the API and behavioral changes. For example, it's probably not necessary for the KIP to mention that the implementation would use DateTimeFormatter. However, it would be worth mentioning explicitly that the `format.input` configuration property will take a regular expression that is compatible with the JDK's DateTimeFormatter.ofPattern method -- this very clearly states the constraint on the input values. Again, very nice job on the KIP so far! Best regards, Randall On Sat, May 15, 2021 at 6:50 AM Joshua Grisham <grishamj...@gmail.com> wrote: > Hello again Kafka Developers community! > > I thought it has been some time and I would try to dust KIP-682 off and see > if anyone wanted to take a discussion on them, if it still makes sense or > if this should go in a different direction. > > Anyone have thoughts on this? That Connect's TimestampConverter could be > updated to support converting multiple fields in one pass of the > transformation (instead of chaining multiple instances together if you have > multiple timestamps to convert) and that it could support a regex-like > pattern for matching string input instead of only allowing one string > format and expect that all producers are sending in 100% the same format? > > Since some time has gone there are a few conflicts in my proposed PR but > they seem quite trivial (some error message and formatting of a test result > from a quick glance) so it should not be hard to resolve them. > > Thanks! > > Best, > Joshua > > > On Sat, Nov 7, 2020 at 9:24 PM Joshua Grisham <grishamj...@gmail.com> > wrote: > > > Hi Brandon, > > > > I have added what i called "recursive" support to find child fields on > any > > level to the Cast and ReplaceField transforms as well, the PR is here > > https://github.com/apache/kafka/pull/9493 > > > > I plan to try and write up a KIP for that one as well maybe in the next > > day or two. It is not exactly the same as specifically targeting one > nested > > field based on its full path, but instead will look for the field name at > > any level of the structure and cast/replace all of them if they match. > > > > The same could be added in theory to almost all of the transforms i guess > > but maybe that is for another discussion 😊 > > > > But for now with this TimestampConverter the only thing which I have > > proposed are the two mentioned before - multiple fields and support for > > pattern like matching of string timestamps. > > > > Joshua > > > > Den lör 7 nov. 2020 18:42Brandon Brown <bran...@bbrownsound.com> skrev: > > > >> I love this idea! I have a current KIP for a hash transform and am > >> working adding multi field/nested support to that one. I can think of a > few > >> times we’ve needed functionality like that. I’d add that adding support > for > >> transforming nested fields would be a great feature for this. > >> > >> -Brandon > >> > >> Brandon Brown > >> > On Nov 7, 2020, at 11:11 AM, Joshua Grisham <grishamj...@gmail.com> > >> wrote: > >> > > >> > Hello everyone! > >> > (Thanks to Tom Bentley for directing me in this direction!) > >> > > >> > I have made a series of changes to some of the standard Connect > >> transforms > >> > to meet some of the challenges at my company to consume messages using > >> > Connect, and have been running them for a few weeks now as custom > SMTs. > >> > > >> > I realized that several of these changes might actually be really good > >> > features to be included in the standard transforms so I will open some > >> KIPs > >> > to go along with PRs which I already submitted (apologize for going a > >> bit > >> > backwards in the process as this was my first time working with the > >> Kafka > >> > project). > >> > > >> > The first one I have created a KIP for is KIP-682 on the > >> TimestampConverter > >> > transform. > >> > > >> > Basically the 2 limitations that I am proposing to remove are the > >> ability > >> > to only convert one field at a time, but instead convert multiple > fields > >> > using a common config, and that if any producer sends a slightly > >> different > >> > date/timestamp string format on any message then the transformation > >> > fails... so to instead add the ability to support variations in the > >> string > >> > input format. > >> > > >> > More info here: > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats > >> > > >> > I would appreciate any kind of discussion to make improvements and it > >> would > >> > be great to see changes like this being pushed up. > >> > > >> > Thanks for now! > >> > > >> > Best regards, > >> > Joshua Grisham > >> > > >