Thank you, Chris! and sorry for the delayed response.

Please, find my comments below:

On Mon, 14 Feb 2022 at 17:34, Chris Egerton <chr...@confluent.io.invalid>
wrote:

> Hi Jorge,
>
> Thanks for the KIP! I'd love to see support for nested fields added to the
> out-of-the-box SMTs provided with Connect. Here are my initial thoughts:
>
> 1. I agree that there's a case to be made for expanding HoistField with a
> new config property for identifying a nested, to-be-hoisted field, but the
> example in the KIP doesn't really demonstrate why this would be valuable. I
> think it'd be helpful to expand the example to add other fields in order to
> show how adding nested field support enables users to hoist a nested field
> without dropping other fields from the value. Maybe something like this:
>
>     source = nested.val
>     field = line
>
>     value (before):
>         {
>             "nested": {
>                 "val": 42,
>                 "other val": 96
>             }
>         }
>
>     value (after):
>         {
>             "nested": {
>                 "line": {
>                     "val": 42,
>                 }
>                 "other val": 96
>             }
>         }
>
> 2. Nit: I think "source" is a little strange for the new HoistField
> property name. Maybe "hoisted" or "hoisted.field" would be more
> descriptive?
>
>
About 1. and 2.:
Agree. The example for this SMT is updated and have added the `hoisted`
configuration.


> 3. Is there a reasonable use case for expanding Flatten to be able to
> flatten specific fields? My understanding is that it's mostly useful for
> writing to systems like databases that don't support nested values and
> require everything to be a flat list of key-value pairs. Being able to
> flatten a nested field wouldn't provide any advantage for that use case.
> Are there other cases where it would?
>
> 4. I don't think we should unconditionally change the default delimiter for
> Flatten. It's a backwards-incompatible, breaking change that could cause
> headaches for users. It might be reasonable to change the default value
> dynamically based on whether the user has specified a value for the "field"
> property, but considering the motivation for changing the default is that
> it creates conflicts with the to-be-introduced nested field syntax (which
> could arise with downstream SMTs regardless of whether the user has
> explicitly configured Flatten with the "field" property), I don't know that
> this would be too useful either. I have some thoughts below on how to
> handle possible conflicts between names with dots in their names and dotted
> syntax for nested field references that should hopefully make either change
> unnecessary.
>
>
Fair enough. With the support for nested fields in other SMTs, Flatten
could stay as it is.
This removes the need for (4) changing Flatten config as well.


> 5. I think it's fine to expand ExtractField to support nested notation, but
> it might be worth noting in the rejected alternatives section that this
> isn't strictly necessary since you can replace any single invocation of
> that SMT that uses nested field notation with multiple invocations of it
> that use non-nested notation.
>

Agree. Adding it.


>
> 6. Nit: "RegerRouter" should be "RegexRouter" in the list of SMTs that do
> not require nested structure support.
>
>
Ack. Fixing it.


> 7. It may be rare for dots in field names to occur in the wild (although I
> wouldn't be so certain of this), but unless we want to inflict headaches on
> users of Flatten, I think we're going to have to think about conflicts
> between dotted notation and non-nested fields whose names contain dots. I
> don't think this is actually such a bad thing, though. I agree that dotted
> notation is intuitive and pretty commonplace (in tools like jq, for
> example), so I'd like it if we could stick to it. What about introducing
> escape syntax, using a backslash? That way, users could disambiguate
> between "this.field" (which would refer to the nested field "field" under
> the top-level "this" field), and "this\.field" (which would refer to the
> field named "this.field"). Like with most languages that use the backslash
> for escape sequences, it could also be used to escape itself, in the event
> that a field name contains a backslash. I think this is more intuitive and
> simpler than, e.g., adding a new config property to toggle the delimiter to
> be used when parsing nested field references.
>

I like this approach. Adding to the KIP.


>
> 8. I don't think we can unconditionally turn this feature on. The risk of
> breaking existing pipelines (especially ones that involve, for example, a
> combination of the Flatten and Cast SMTs) is pretty high. I think this
> should be an opt-in feature, at least until the next major release. One way
> we could accomplish this is by introducing a new "field.style" (name
> obviously subject to change) property with values of "plain" (default) and
> "nested". If set to "plain" then the current non-nested behavior is used,
> and if set to "nested", then the proposed nested behavior is. We can
> consider updating the default value to "nested" in a future major release
> (or even codify that plan in this KIP, if there's enough support for it).
> This would also leave the door open for adding recursive support to SMTs in
> the future by adding a permitted value of "recursive".
>
> 9. One of the linked tickets in the "Motivation" section,
> https://issues.apache.org/jira/browse/KAFKA-10640, has an open PR and KIP
> that propose adding recursive support to some SMTs. Have you considered
> this type of functionality for your KIP? Or is your aim to stick solely to
> nested field support?
>

I like the `field.style` configuration flag approach.
Thanks for pointing out the `recursive` approach. Will add `nested` at the
moment, let's check the demand for `recursive` to consider it as part of
this or another KIP.

I have added the following on the KIP:

```
Future KIPs could extend this support for:

- Recursive notation: name a field and apply it to all fields across the
schema matching that name.
- Access to arrays: Adding `[]` notation to represent access to arrays and
applying SMTs to fields within an array.
```


>
> Cheers,
>
> Chris
>
> On Tue, Feb 8, 2022 at 1:23 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi Dev team,
> >
> > I'd like to start a new discussion thread on Kafka Connect KIP-821:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> >
> > This KIP is aimed to include support for nested structures on the
> existing
> > SMTs — where this make sense.
> >
> > Looking forward to your feedback.
> >
> > Regards,
> > Jorge.
> >
>

Reply via email to