Thank you, Chris! and sorry for the delayed response. Please, find my comments below:
On Mon, 14 Feb 2022 at 17:34, Chris Egerton <chr...@confluent.io.invalid> wrote: > Hi Jorge, > > Thanks for the KIP! I'd love to see support for nested fields added to the > out-of-the-box SMTs provided with Connect. Here are my initial thoughts: > > 1. I agree that there's a case to be made for expanding HoistField with a > new config property for identifying a nested, to-be-hoisted field, but the > example in the KIP doesn't really demonstrate why this would be valuable. I > think it'd be helpful to expand the example to add other fields in order to > show how adding nested field support enables users to hoist a nested field > without dropping other fields from the value. Maybe something like this: > > source = nested.val > field = line > > value (before): > { > "nested": { > "val": 42, > "other val": 96 > } > } > > value (after): > { > "nested": { > "line": { > "val": 42, > } > "other val": 96 > } > } > > 2. Nit: I think "source" is a little strange for the new HoistField > property name. Maybe "hoisted" or "hoisted.field" would be more > descriptive? > > About 1. and 2.: Agree. The example for this SMT is updated and have added the `hoisted` configuration. > 3. Is there a reasonable use case for expanding Flatten to be able to > flatten specific fields? My understanding is that it's mostly useful for > writing to systems like databases that don't support nested values and > require everything to be a flat list of key-value pairs. Being able to > flatten a nested field wouldn't provide any advantage for that use case. > Are there other cases where it would? > > 4. I don't think we should unconditionally change the default delimiter for > Flatten. It's a backwards-incompatible, breaking change that could cause > headaches for users. It might be reasonable to change the default value > dynamically based on whether the user has specified a value for the "field" > property, but considering the motivation for changing the default is that > it creates conflicts with the to-be-introduced nested field syntax (which > could arise with downstream SMTs regardless of whether the user has > explicitly configured Flatten with the "field" property), I don't know that > this would be too useful either. I have some thoughts below on how to > handle possible conflicts between names with dots in their names and dotted > syntax for nested field references that should hopefully make either change > unnecessary. > > Fair enough. With the support for nested fields in other SMTs, Flatten could stay as it is. This removes the need for (4) changing Flatten config as well. > 5. I think it's fine to expand ExtractField to support nested notation, but > it might be worth noting in the rejected alternatives section that this > isn't strictly necessary since you can replace any single invocation of > that SMT that uses nested field notation with multiple invocations of it > that use non-nested notation. > Agree. Adding it. > > 6. Nit: "RegerRouter" should be "RegexRouter" in the list of SMTs that do > not require nested structure support. > > Ack. Fixing it. > 7. It may be rare for dots in field names to occur in the wild (although I > wouldn't be so certain of this), but unless we want to inflict headaches on > users of Flatten, I think we're going to have to think about conflicts > between dotted notation and non-nested fields whose names contain dots. I > don't think this is actually such a bad thing, though. I agree that dotted > notation is intuitive and pretty commonplace (in tools like jq, for > example), so I'd like it if we could stick to it. What about introducing > escape syntax, using a backslash? That way, users could disambiguate > between "this.field" (which would refer to the nested field "field" under > the top-level "this" field), and "this\.field" (which would refer to the > field named "this.field"). Like with most languages that use the backslash > for escape sequences, it could also be used to escape itself, in the event > that a field name contains a backslash. I think this is more intuitive and > simpler than, e.g., adding a new config property to toggle the delimiter to > be used when parsing nested field references. > I like this approach. Adding to the KIP. > > 8. I don't think we can unconditionally turn this feature on. The risk of > breaking existing pipelines (especially ones that involve, for example, a > combination of the Flatten and Cast SMTs) is pretty high. I think this > should be an opt-in feature, at least until the next major release. One way > we could accomplish this is by introducing a new "field.style" (name > obviously subject to change) property with values of "plain" (default) and > "nested". If set to "plain" then the current non-nested behavior is used, > and if set to "nested", then the proposed nested behavior is. We can > consider updating the default value to "nested" in a future major release > (or even codify that plan in this KIP, if there's enough support for it). > This would also leave the door open for adding recursive support to SMTs in > the future by adding a permitted value of "recursive". > > 9. One of the linked tickets in the "Motivation" section, > https://issues.apache.org/jira/browse/KAFKA-10640, has an open PR and KIP > that propose adding recursive support to some SMTs. Have you considered > this type of functionality for your KIP? Or is your aim to stick solely to > nested field support? > I like the `field.style` configuration flag approach. Thanks for pointing out the `recursive` approach. Will add `nested` at the moment, let's check the demand for `recursive` to consider it as part of this or another KIP. I have added the following on the KIP: ``` Future KIPs could extend this support for: - Recursive notation: name a field and apply it to all fields across the schema matching that name. - Access to arrays: Adding `[]` notation to represent access to arrays and applying SMTs to fields within an array. ``` > > Cheers, > > Chris > > On Tue, Feb 8, 2022 at 1:23 PM Jorge Esteban Quilcate Otoya < > quilcate.jo...@gmail.com> wrote: > > > Hi Dev team, > > > > I'd like to start a new discussion thread on Kafka Connect KIP-821: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures > > > > This KIP is aimed to include support for nested structures on the > existing > > SMTs — where this make sense. > > > > Looking forward to your feedback. > > > > Regards, > > Jorge. > > >