Thanks Joshua for this KIP, it looks like a useful addition. I have a few questions:
1. In Public Interfaces and Proposed Changes, the KIP mentions the current "field" setting will be renamed to "fields". Do you instead mean a new setting "fields" will be added and the existing setting will be marked deprecated? 2. What happens if a user specifies both "field" and "fields"? Is one setting taking precedence? Is it a bad configuration that will be rejected? Same for the format and format.input/format.output settings. 3. Can you also add the proposed documentation for the new settings in the KIP? Best, Mickael On Mon, Nov 22, 2021 at 11:37 AM Joshua Grisham <grishamj...@gmail.com> wrote: > > Hi Randall, thanks again for the great feedback! Sorry it took me a minute > to get back on this one--I have been a bit full up over the last few > months--but now I will try to brush it off and see if we can figure out if > it makes sense to move forward and possibly close this one out. > > Regarding backward compatibility -- the intention of my proposed changes > (including the candidate PR) was that it still is backward compatible, but > I realize that what I wrote in the KIP was more theoretical about what > could happen instead of what I actually did in the PR (that I actually made > it backward-compatible). So now I have updated the text in the KIP to > hopefully more accurately reflect this and I think/hope that it, along with > what is in the candidate PR, addresses everything you mentioned in your > first and second paragraph. > > For clarity here is exactly what I have done in the PR: > > First I saw that the way the config properties for TimestampConverter > lacked the same structure as it did for other SMTs (e.g. ReplaceField as > one example), especially since I was adding several new properties, so I > added the same ConfigName and ConfigDefault interfaces (instead of just > having various string attributes on the TimestampConverter class itself). > The existing public class string attributes FIELD_CONFIG, > TARGET_TYPE_CONFIG, and FORMAT_CONFIG all still exist and all still can be > continued to used by anyone else, they will just get warnings at build time > since I marked them with @Deprecated. I did remove the private class > strings since they were no longer used in the class but this should not > have any impact on compatibility. > > Second I added the new properties to CONFIG_DEF but kept the existing > properties there (so they still exist and can be used) and in the > configure() method I have used ConfigUtils.translateDeprecatedConfigs() to > alias the old "field" property to the new "fields" property automatically > just like was done in ReplaceField from the change from KIP-629 ( > https://github.com/apache/kafka/commit/eab61cad2c418786ab5e58aa1267f689d82a61d1 > ) > In the end it means that a user can still have their connector use the > "field" property but when the configure() method is called it will just put > whatever they set in "field" to "fields" automatically. > > Also the property "format" still behaves exactly the same as before if it > is used (it continues to function as a SimpleDateFormatter input and > output) it is just that there is an additional capability added by > voluntarily adopting the two new properties "format.input" and > "format.output". > > As far as I can tell, in theory someone could have an existing connector, > using TimestampConverter (or something else which uses the API even), > upgrade to this version, and not change anything else, and it still behaves > the same as before. But perhaps it should not be marked as @Deprecated to > avoid the warnings or otherwise is there a clearly defined deprecation > policy that can be followed? I tried to search and look in the Kafka docs > but could not find anything after several attempts and rabbit-holes > unfortunately (closest I seem to find is just a list or documentation on > what has been deprecated in each release). > > However please do say if there is anything else you think is missing either > in the implementation or now in the updated KIP regarding this! > > Also hopefully with my updates to the KIP it has addressed your other > comments or questions, but please feel free to give it another look when > you have time and I would welcome any feedback that you have. > > I also noticed there was one small change where my PR now has a merge > conflict but it should be pretty minimal to update as long as everything > else is still looking ok.. I will try to take a look at this soon as well > if I am able. > > Thanks again and have a great week! > > Best, > Joshua > > > > Den ons 28 juli 2021 kl 16:05 skrev Randall Hauch <rha...@gmail.com>: > > > Thanks for the contribution, Joshua. Overall I think this is a really good > > improvement to this connector, for all the reasons cited in the motivation > > section. I do have some requests, though. > > > > The biggest issue I have is that the change is not backward compatible. > > We've worked really hard to make sure that all Connect APIs (including > > configuration properties) are always backward compatible because this makes > > it trivial for users to upgrade from one Connect release to a newer one. > > After all, any users that are already using this SMT in their connectors > > should be able to upgrade Connect and still have the SMT behave exactly as > > it did in earlier releases -- unless they opt in to changing their > > configuration to use the new feature. So I would request that the KIP and > > proposed implementation be changed to not *change* existing configuration > > properties. We can of course introduce new config properties, but they > > should have a default that results in exactly the same old behavior as in > > earlier releases. A third option is to create a new SMT, though we should > > try really hard to just improve the existing SMT since that's generally a > > much better UX. Assuming we can make the existing transformation > > work, the option to create a new SMT should be documented in the Rejected > > Alternatives section, too. > > > > For example, there are (at least?) two options with regard to specifying > > the field names. 1) just keep `field` as the name even though a > > comma-separated list is allowed, or 2) add `fields` but keep (and > > deprecate?) the existing `field` configuration, using `fields` if defined > > or if `field` is not defined, and maybe even supporting comma separated > > lists of names in both fields. The same goes for `format` and > > `format.output`. In general, I think 2 *might* provide a better experience > > at the cost of higher implementation complexity. Regardless of which you > > choose, the KIP should explicitly propose one approach, and document the > > other(s) in the rejected alternatives section. After all, the KIP should be > > very clear about what will be proposed. > > > > In terms of deprecating the old configs in option 2, it'd be fine to do > > this in the documentation (and code, if applicable) as part of this > > feature. That means they could be removed at the earliest in the next major > > release after the feature is merged, which would be AK 4.0. And it would be > > good to mention this explicitly in the KIP so that we don't need another > > KIP just to remove the deprecation; so something like "This configuration > > property will be deprecated and documented as such, and eventually removed > > in a subsequent major release." > > > > Supporting nested fields would also be a nice touch here, too. However, as > > you mention, we'd likely want to make the same change to other SMTs, and as > > such we might consider adding support for nested names on all SMTs with a > > separate KIP. The two KIPs could be written such that they are independent. > > > > Finally, a KIP need not go into much detail about the implementation, and > > should instead focus on the API and behavioral changes. For example, it's > > probably not necessary for the KIP to mention that the implementation would > > use DateTimeFormatter. However, it would be worth mentioning explicitly > > that the `format.input` configuration property will take a regular > > expression that is compatible with the JDK's DateTimeFormatter.ofPattern > > method > > -- this very clearly states the constraint on the input values. > > > > Again, very nice job on the KIP so far! > > > > Best regards, > > > > Randall > > > > On Sat, May 15, 2021 at 6:50 AM Joshua Grisham <grishamj...@gmail.com> > > wrote: > > > > > Hello again Kafka Developers community! > > > > > > I thought it has been some time and I would try to dust KIP-682 off and > > see > > > if anyone wanted to take a discussion on them, if it still makes sense or > > > if this should go in a different direction. > > > > > > Anyone have thoughts on this? That Connect's TimestampConverter could be > > > updated to support converting multiple fields in one pass of the > > > transformation (instead of chaining multiple instances together if you > > have > > > multiple timestamps to convert) and that it could support a regex-like > > > pattern for matching string input instead of only allowing one string > > > format and expect that all producers are sending in 100% the same format? > > > > > > Since some time has gone there are a few conflicts in my proposed PR but > > > they seem quite trivial (some error message and formatting of a test > > result > > > from a quick glance) so it should not be hard to resolve them. > > > > > > Thanks! > > > > > > Best, > > > Joshua > > > > > > > > > On Sat, Nov 7, 2020 at 9:24 PM Joshua Grisham <grishamj...@gmail.com> > > > wrote: > > > > > > > Hi Brandon, > > > > > > > > I have added what i called "recursive" support to find child fields on > > > any > > > > level to the Cast and ReplaceField transforms as well, the PR is here > > > > https://github.com/apache/kafka/pull/9493 > > > > > > > > I plan to try and write up a KIP for that one as well maybe in the next > > > > day or two. It is not exactly the same as specifically targeting one > > > nested > > > > field based on its full path, but instead will look for the field name > > at > > > > any level of the structure and cast/replace all of them if they match. > > > > > > > > The same could be added in theory to almost all of the transforms i > > guess > > > > but maybe that is for another discussion 😊 > > > > > > > > But for now with this TimestampConverter the only thing which I have > > > > proposed are the two mentioned before - multiple fields and support for > > > > pattern like matching of string timestamps. > > > > > > > > Joshua > > > > > > > > Den lör 7 nov. 2020 18:42Brandon Brown <bran...@bbrownsound.com> > > skrev: > > > > > > > >> I love this idea! I have a current KIP for a hash transform and am > > > >> working adding multi field/nested support to that one. I can think of > > a > > > few > > > >> times we’ve needed functionality like that. I’d add that adding > > support > > > for > > > >> transforming nested fields would be a great feature for this. > > > >> > > > >> -Brandon > > > >> > > > >> Brandon Brown > > > >> > On Nov 7, 2020, at 11:11 AM, Joshua Grisham <grishamj...@gmail.com> > > > >> wrote: > > > >> > > > > >> > Hello everyone! > > > >> > (Thanks to Tom Bentley for directing me in this direction!) > > > >> > > > > >> > I have made a series of changes to some of the standard Connect > > > >> transforms > > > >> > to meet some of the challenges at my company to consume messages > > using > > > >> > Connect, and have been running them for a few weeks now as custom > > > SMTs. > > > >> > > > > >> > I realized that several of these changes might actually be really > > good > > > >> > features to be included in the standard transforms so I will open > > some > > > >> KIPs > > > >> > to go along with PRs which I already submitted (apologize for going > > a > > > >> bit > > > >> > backwards in the process as this was my first time working with the > > > >> Kafka > > > >> > project). > > > >> > > > > >> > The first one I have created a KIP for is KIP-682 on the > > > >> TimestampConverter > > > >> > transform. > > > >> > > > > >> > Basically the 2 limitations that I am proposing to remove are the > > > >> ability > > > >> > to only convert one field at a time, but instead convert multiple > > > fields > > > >> > using a common config, and that if any producer sends a slightly > > > >> different > > > >> > date/timestamp string format on any message then the transformation > > > >> > fails... so to instead add the ability to support variations in the > > > >> string > > > >> > input format. > > > >> > > > > >> > More info here: > > > >> > > > > >> > > > > >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-682%3A+Connect+TimestampConverter+support+for+multiple+fields+and+multiple+input+formats > > > >> > > > > >> > I would appreciate any kind of discussion to make improvements and > > it > > > >> would > > > >> > be great to see changes like this being pushed up. > > > >> > > > > >> > Thanks for now! > > > >> > > > > >> > Best regards, > > > >> > Joshua Grisham > > > >> > > > > > > > > >