Hi Mickael,

Thanks for the KIP!

While it's tempting to add this field to the out-of-the-box connector
config def, I'm a little hesitant, for two reasons.

1. Adding this directly to the connector config could have unintended
consequences on the actual data processing by the connector. Any time a
connector's config is modified, the Connector object running for it is
restarted with that new config. In most cases this is a trivial operation
since we have incremental rebalancing enabled by default, the connector can
(and probably should!) generate task configs that are functionally
identical to the ones it last generated, and most (though not all)
Connector classes are fairly lightweight and leave the real work to their
Task class. However, due to KAFKA-9228 [1], it's not just common practice
for connectors to perform transparent passthrough of most of their configs
when generating task configs, it's actually necessary to work around a bug
in the runtime. As a result, tweaking the description of a connector would
be fairly likely to result in a full restart of all of its tasks, in
addition to triggering two rebalances (which may not be so lightweight if
users are still running with eager rebalancing... which, sadly, I've heard
is still happening today).

2. The motivation section mentions some information that might go into the
description field, such as the team that owns the connector and emergency
contact info. It seems like this info might benefit from a little more
structure if we're trying to design for programmatic access by GUIs and
CLIs (which I'm assuming is the case, since I can't imagine a human being
getting much use out of the raw output from the GET
/connector-plugins/{name}/config and PUT
/connector-plugins/{name}/config/validate endpoints). This might also make
it easier to add custom validation logic around what kind of information is
present via REST extension.


With these thoughts in mind, what do you think about adding a new generic
"tags" object to connectors that can support arbitrary user-provided
key/value pairs? If using the POST /connectors endpoint, it might look
something like this:

{
  "name": "my-connector",
  "config": {
    "connector.class": "MirrorSource",
    "tasks.max": "908"
  },
  "tags": {
    "team": "data-infra",
    "phone": "12345678901",
    "email": "din...@example.org"
  }
}

And, to allow users to modify connector tags after one has been created, we
might introduce a /connectors/{name}/tags endpoint with PUT/PATCH/DELETE
verbs that writes tag info for a connector to the config topic, but without
altering the actual connector config (allowing us to skip a rebalance
altogether).

One other thing we could consider is allowing cluster administrators to
require that connectors are created with a certain set of tags, or even add
per-tag regex validation. They might specify something like
"connector.required.tags = team, phone, email" or
"connector.tags.phone.regex = [0-9]{11}" in a worker config file. But this
is probably overboard for now, especially since it's already possible to
accomplish via a REST extension.

Finally, I'm also wondering if, in pursuit of expanding Connect's
out-of-the-box support for dealing with connector metadata, we might want
to expose created/last-updated times for connector configurations. We
definitely don't have to do that in this KIP but if you agree that this
would be useful, we should probably keep it in mind to avoid painting
ourselves into a corner. This is why I'm thinking of using "tags" instead
of something a little more generic like "metadata", BTW.

[1] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration

Cheers,

Chris

On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <mickael.mai...@gmail.com>
wrote:

> Hi,
>
> I created a very small KIP to add a description field to connectors:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
>
> Let me know if you have any feedback.
>
> Thanks,
> Mickael
>

Reply via email to