Thanks, Chris! Much appreciated all the feedback here.

1. You nailed it setting the design goal here: "it shouldn't be impossible
to use this new feature for any field name, no matter how convoluted. It's
fine if edge cases introduce difficulty (such as less-readable
configurations), but it's not fine if they can't be addressed at all."
Back to the previous proposals (using only dots as separators) we have 2
alternatives:
1. escaping with backslashes
2. escaping with dots itself

I'll lean for alternative 2, as you proposed before. Feels to me that
backslashes have more potential to lead to confusion in JSON configs, and
CSV seems like a good precedent to use the same character to escape itself.
KIP is updated to reflect this.

2. Thanks! I'll add an example, and stick with the current approach
defining the style per individual transform configuration.

3. Yes, thanks! KIP updated.

4. Of course. KIP updated.

On Mon, 28 Mar 2022 at 21:59, Chris Egerton <fearthecel...@gmail.com> wrote:

> Hi Jorge,
>
> Thanks for addressing my comments; the KIP looks up-to-date and pretty
> readable now, and the rejected alternatives section does a great job of
> outlining the discussion so far and providing context for anyone else who
> might want to join in.
>
> 1. Thoughts on choice of delimiter:
> - I like the optimization for simple cases, but I think the new proposal is
> a little too restrictive. What if there's a field whose name contains all
> of the permitted options (currently just ".", ",", and "/")?
> - If we expand the set of permitted delimiters to allow for any
> single-character string, configuration complexity will increase and
> readability may decrease
> - Also worth pointing out that there is some convention for doubling a
> delimiter character as an escape mechanism with formats like CSV [1]
> - Overall I think we may be approaching the saturation point for productive
> discussion on delimiter syntax so I don't want to spend too much more of
> your time on it. I think the one point I'd like to leave for now is that it
> shouldn't be impossible to use this new feature for any field name, no
> matter how convoluted. It's fine if edge cases introduce difficulty (such
> as less-readable configurations), but it's not fine if they can't be
> addressed at all.
>
> 2.
> The configuration style where you define "transforms.field.style" in the
> connector config, and then this applies to all SMTs for the connector, is
> very interesting. However, it doesn't follow convention for existing SMTs.
> Right now, if you want to configure an SMT, you define its name in the
> connector config (for example, "transforms": "smt1"), and then define all
> of the properties for that SMT in the connector config using a namespacing
> mechanism specific to that SMT (for example, "transforms.smt1.prop1":
> "val1"). That SMT then sees only the properties defined in that namespace,
> with the prefix stripped (for example, "prop1": "val1") in its configure
> [2] [3] method.
> If we want to continue to follow this convention, then instead of
> specifying "transforms.field.style" in a connector config, we would expect
> users to configure "transforms.<name>.field.style", for each SMT that they
> want to configure a field style for. This would require more work on the
> part of the user, but would be simpler to reason about and easier to
> implement.
> If we want to explore an alternative where users can specify global
> properties that apply to all transforms in a connector config, then the
> semantics for this need to be defined in the KIP. This would have to
> include whether this will apply only for the special case of the
> "field.style" and possibly "field.separator" properties or if it would be
> available more generally for other properties, whether it will apply only
> for the SMTs outlined in the KIP or if the "field.style" and possibly
> "field.separator" properties would also be passed into custom SMTs so that
> they could choose to act on them if applicable, how edge cases like having
> an SMT named "field" in your connector config would be handled, etc.
> Either way, it might help to have an example in the KIP outlining how one
> of the to-be-augmented SMTs can be configured with this new feature and a
> before/after of how a record value would be transformed with that
> configuration.
>
> 3. The docstring for the "transforms.field.style" property mentions that
> the permitted values are "plain" and "nested", but then describes behavior
> with a value of "root". Should that be "plain" instead?
>
> 4. The docstring for the "transforms.field.separator" property exclusively
> mentions structs, but the feature is intended to work with maps as well.
> Can we update it to reflect this?
>
> References:
>
> [1] - https://stackoverflow.com/a/17808731
> [2] -
>
> https://github.com/apache/kafka/blob/7243facb8d69a7252e6b9556b5eaee13e41bab7f/connect/api/src/main/java/org/apache/kafka/connect/transforms/Transformation.java#L30
> [3] -
>
> https://github.com/apache/kafka/blob/7243facb8d69a7252e6b9556b5eaee13e41bab7f/clients/src/main/java/org/apache/kafka/common/Configurable.java#L26-L29
>
> Cheers,
>
> Chris
>
> On Mon, Mar 28, 2022 at 1:32 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Thanks, Chris!
> >
> > 1. I'd argue "this..field.child" could be harder to grasp than
> > "this.field/child" + separator: "/".
> > Even though this represents additional information, it follows a similar
> > approach as the "Flatten#delimeter" configuration.
> > I want to give the separator approach another try, so I have updated the
> > KIP with the separator proposal, sticking to only 2 alternatives that
> > should hopefully cover most scenarios.
> >
> > 2. Agree. KIP has been updated with this improvement.
> >
> > 3. You're right. I have updated this section accordingly.
> >
> > 4. Good catch! I've replaced it with `DropHeaders`.
> >
> > Looking forward to your feedback.
> >
> > Thanks,
> > Jorge.
> >
> > On Wed, 9 Mar 2022 at 21:33, Chris Egerton <fearthecel...@gmail.com>
> > wrote:
> >
> > > Hi Jorge,
> > >
> > > Looking good! Got a few more thoughts.
> > >
> > > 1. Sorry to revisit this, but I think we may want to adopt a slightly
> > > different escape syntax style. Backslashes are great, but since they're
> > > already used by JSON, using them as an escape sequence in field
> notation
> > > would also lead to some pretty ugly connector configs. Anyone who's had
> > to
> > > write regular expressions with backslashes in Java is probably already
> > > familiar with this: "this\\\\.is\\\\.not\\\\.very\\\\.readable". What
> do
> > > you think about using the dot character to escape itself? In other
> words,
> > > to access a single field named "this.field", instead of using the
> syntax
> > > "this\.field" (which in JSON would have to be expressed as
> > "this\\.field"),
> > > we could use "this..field", and for a single field named "this\field",
> > > instead of using the syntax "this\\field" (or, in JSON,
> "this\\\\field"),
> > > we could use "this\field" (or, in JSON, "this\\field").
> > >
> > > 2. Could you flesh out the details on the new "field.style" property,
> > > including the type, default value, importance, and a preliminary
> > docstring?
> > > See
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors#KIP618:ExactlyOnceSupportforSourceConnectors-Newproperties
> > > for an example.
> > >
> > > 3. Is the "Compatibility, Deprecation, and Migration Plan" section
> still
> > > accurate after the latest update? Seems like it's still written with
> the
> > > assumption that nested field syntax will be hardcoded or opt-in, which
> > IIUC
> > > isn't the case anymore.
> > >
> > > 4. Nit: The "These SMTs do not require nested structure support"
> section
> > > mentions a "Drop" SMT. I think this may be referring to the Confluent
> > Drop
> > > SMT, which isn't a part of Apache Kafka. Should we drop (heh) that SMT
> > from
> > > the list? Or perhaps just replace it with "DropHeaders", which is
> > currently
> > > missing from the list and shouldn't require any nested-field related
> > > updates?
> > >
> >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Mon, Feb 28, 2022 at 2:12 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Thank you, Chris! and sorry for the delayed response.
> > > >
> > > > Please, find my comments below:
> > > >
> > > > On Mon, 14 Feb 2022 at 17:34, Chris Egerton
> > <chr...@confluent.io.invalid
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jorge,
> > > > >
> > > > > Thanks for the KIP! I'd love to see support for nested fields added
> > to
> > > > the
> > > > > out-of-the-box SMTs provided with Connect. Here are my initial
> > > thoughts:
> > > > >
> > > > > 1. I agree that there's a case to be made for expanding HoistField
> > > with a
> > > > > new config property for identifying a nested, to-be-hoisted field,
> > but
> > > > the
> > > > > example in the KIP doesn't really demonstrate why this would be
> > > > valuable. I
> > > > > think it'd be helpful to expand the example to add other fields in
> > > order
> > > > to
> > > > > show how adding nested field support enables users to hoist a
> nested
> > > > field
> > > > > without dropping other fields from the value. Maybe something like
> > > this:
> > > > >
> > > > >     source = nested.val
> > > > >     field = line
> > > > >
> > > > >     value (before):
> > > > >         {
> > > > >             "nested": {
> > > > >                 "val": 42,
> > > > >                 "other val": 96
> > > > >             }
> > > > >         }
> > > > >
> > > > >     value (after):
> > > > >         {
> > > > >             "nested": {
> > > > >                 "line": {
> > > > >                     "val": 42,
> > > > >                 }
> > > > >                 "other val": 96
> > > > >             }
> > > > >         }
> > > > >
> > > > > 2. Nit: I think "source" is a little strange for the new HoistField
> > > > > property name. Maybe "hoisted" or "hoisted.field" would be more
> > > > > descriptive?
> > > > >
> > > > >
> > > > About 1. and 2.:
> > > > Agree. The example for this SMT is updated and have added the
> `hoisted`
> > > > configuration.
> > > >
> > > >
> > > > > 3. Is there a reasonable use case for expanding Flatten to be able
> to
> > > > > flatten specific fields? My understanding is that it's mostly
> useful
> > > for
> > > > > writing to systems like databases that don't support nested values
> > and
> > > > > require everything to be a flat list of key-value pairs. Being able
> > to
> > > > > flatten a nested field wouldn't provide any advantage for that use
> > > case.
> > > > > Are there other cases where it would?
> > > > >
> > > > > 4. I don't think we should unconditionally change the default
> > delimiter
> > > > for
> > > > > Flatten. It's a backwards-incompatible, breaking change that could
> > > cause
> > > > > headaches for users. It might be reasonable to change the default
> > value
> > > > > dynamically based on whether the user has specified a value for the
> > > > "field"
> > > > > property, but considering the motivation for changing the default
> is
> > > that
> > > > > it creates conflicts with the to-be-introduced nested field syntax
> > > (which
> > > > > could arise with downstream SMTs regardless of whether the user has
> > > > > explicitly configured Flatten with the "field" property), I don't
> > know
> > > > that
> > > > > this would be too useful either. I have some thoughts below on how
> to
> > > > > handle possible conflicts between names with dots in their names
> and
> > > > dotted
> > > > > syntax for nested field references that should hopefully make
> either
> > > > change
> > > > > unnecessary.
> > > > >
> > > > >
> > > > Fair enough. With the support for nested fields in other SMTs,
> Flatten
> > > > could stay as it is.
> > > > This removes the need for (4) changing Flatten config as well.
> > > >
> > > >
> > > > > 5. I think it's fine to expand ExtractField to support nested
> > notation,
> > > > but
> > > > > it might be worth noting in the rejected alternatives section that
> > this
> > > > > isn't strictly necessary since you can replace any single
> invocation
> > of
> > > > > that SMT that uses nested field notation with multiple invocations
> of
> > > it
> > > > > that use non-nested notation.
> > > > >
> > > >
> > > > Agree. Adding it.
> > > >
> > > >
> > > > >
> > > > > 6. Nit: "RegerRouter" should be "RegexRouter" in the list of SMTs
> > that
> > > do
> > > > > not require nested structure support.
> > > > >
> > > > >
> > > > Ack. Fixing it.
> > > >
> > > >
> > > > > 7. It may be rare for dots in field names to occur in the wild
> > > (although
> > > > I
> > > > > wouldn't be so certain of this), but unless we want to inflict
> > > headaches
> > > > on
> > > > > users of Flatten, I think we're going to have to think about
> > conflicts
> > > > > between dotted notation and non-nested fields whose names contain
> > > dots. I
> > > > > don't think this is actually such a bad thing, though. I agree that
> > > > dotted
> > > > > notation is intuitive and pretty commonplace (in tools like jq, for
> > > > > example), so I'd like it if we could stick to it. What about
> > > introducing
> > > > > escape syntax, using a backslash? That way, users could
> disambiguate
> > > > > between "this.field" (which would refer to the nested field "field"
> > > under
> > > > > the top-level "this" field), and "this\.field" (which would refer
> to
> > > the
> > > > > field named "this.field"). Like with most languages that use the
> > > > backslash
> > > > > for escape sequences, it could also be used to escape itself, in
> the
> > > > event
> > > > > that a field name contains a backslash. I think this is more
> > intuitive
> > > > and
> > > > > simpler than, e.g., adding a new config property to toggle the
> > > delimiter
> > > > to
> > > > > be used when parsing nested field references.
> > > > >
> > > >
> > > > I like this approach. Adding to the KIP.
> > > >
> > > >
> > > > >
> > > > > 8. I don't think we can unconditionally turn this feature on. The
> > risk
> > > of
> > > > > breaking existing pipelines (especially ones that involve, for
> > > example, a
> > > > > combination of the Flatten and Cast SMTs) is pretty high. I think
> > this
> > > > > should be an opt-in feature, at least until the next major release.
> > One
> > > > way
> > > > > we could accomplish this is by introducing a new "field.style"
> (name
> > > > > obviously subject to change) property with values of "plain"
> > (default)
> > > > and
> > > > > "nested". If set to "plain" then the current non-nested behavior is
> > > used,
> > > > > and if set to "nested", then the proposed nested behavior is. We
> can
> > > > > consider updating the default value to "nested" in a future major
> > > release
> > > > > (or even codify that plan in this KIP, if there's enough support
> for
> > > it).
> > > > > This would also leave the door open for adding recursive support to
> > > SMTs
> > > > in
> > > > > the future by adding a permitted value of "recursive".
> > > > >
> > > > > 9. One of the linked tickets in the "Motivation" section,
> > > > > https://issues.apache.org/jira/browse/KAFKA-10640, has an open PR
> > and
> > > > KIP
> > > > > that propose adding recursive support to some SMTs. Have you
> > considered
> > > > > this type of functionality for your KIP? Or is your aim to stick
> > solely
> > > > to
> > > > > nested field support?
> > > > >
> > > >
> > > > I like the `field.style` configuration flag approach.
> > > > Thanks for pointing out the `recursive` approach. Will add `nested`
> at
> > > the
> > > > moment, let's check the demand for `recursive` to consider it as part
> > of
> > > > this or another KIP.
> > > >
> > > > I have added the following on the KIP:
> > > >
> > > > ```
> > > > Future KIPs could extend this support for:
> > > >
> > > > - Recursive notation: name a field and apply it to all fields across
> > the
> > > > schema matching that name.
> > > > - Access to arrays: Adding `[]` notation to represent access to
> arrays
> > > and
> > > > applying SMTs to fields within an array.
> > > > ```
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Tue, Feb 8, 2022 at 1:23 PM Jorge Esteban Quilcate Otoya <
> > > > > quilcate.jo...@gmail.com> wrote:
> > > > >
> > > > > > Hi Dev team,
> > > > > >
> > > > > > I'd like to start a new discussion thread on Kafka Connect
> KIP-821:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> > > > > >
> > > > > > This KIP is aimed to include support for nested structures on the
> > > > > existing
> > > > > > SMTs — where this make sense.
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > Regards,
> > > > > > Jorge.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to