Actually, I misspoke--a rebalance isn't triggered when an existing
connector's config is updated. Assuming the set of workers remains stable,
a rebalance is only necessary when a new connector is created, an existing
one is deleted, or a new set of task configs is generated.

This weakens the point about unnecessary rebalances when a connector's
description is updated, but doesn't entirely address it. Spurious
rebalances may still be triggered if a new set of task configs is
generated, which for reasons outlined above, is fairly likely.

On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <chr...@aiven.io> wrote:

> Hi Mickael,
>
> Thanks for the KIP!
>
> While it's tempting to add this field to the out-of-the-box connector
> config def, I'm a little hesitant, for two reasons.
>
> 1. Adding this directly to the connector config could have unintended
> consequences on the actual data processing by the connector. Any time a
> connector's config is modified, the Connector object running for it is
> restarted with that new config. In most cases this is a trivial operation
> since we have incremental rebalancing enabled by default, the connector can
> (and probably should!) generate task configs that are functionally
> identical to the ones it last generated, and most (though not all)
> Connector classes are fairly lightweight and leave the real work to their
> Task class. However, due to KAFKA-9228 [1], it's not just common practice
> for connectors to perform transparent passthrough of most of their configs
> when generating task configs, it's actually necessary to work around a bug
> in the runtime. As a result, tweaking the description of a connector would
> be fairly likely to result in a full restart of all of its tasks, in
> addition to triggering two rebalances (which may not be so lightweight if
> users are still running with eager rebalancing... which, sadly, I've heard
> is still happening today).
>
> 2. The motivation section mentions some information that might go into the
> description field, such as the team that owns the connector and emergency
> contact info. It seems like this info might benefit from a little more
> structure if we're trying to design for programmatic access by GUIs and
> CLIs (which I'm assuming is the case, since I can't imagine a human being
> getting much use out of the raw output from the GET
> /connector-plugins/{name}/config and PUT
> /connector-plugins/{name}/config/validate endpoints). This might also make
> it easier to add custom validation logic around what kind of information is
> present via REST extension.
>
>
> With these thoughts in mind, what do you think about adding a new generic
> "tags" object to connectors that can support arbitrary user-provided
> key/value pairs? If using the POST /connectors endpoint, it might look
> something like this:
>
> {
>   "name": "my-connector",
>   "config": {
>     "connector.class": "MirrorSource",
>     "tasks.max": "908"
>   },
>   "tags": {
>     "team": "data-infra",
>     "phone": "12345678901",
>     "email": "din...@example.org"
>   }
> }
>
> And, to allow users to modify connector tags after one has been created,
> we might introduce a /connectors/{name}/tags endpoint with PUT/PATCH/DELETE
> verbs that writes tag info for a connector to the config topic, but without
> altering the actual connector config (allowing us to skip a rebalance
> altogether).
>
> One other thing we could consider is allowing cluster administrators to
> require that connectors are created with a certain set of tags, or even add
> per-tag regex validation. They might specify something like
> "connector.required.tags = team, phone, email" or
> "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But this
> is probably overboard for now, especially since it's already possible to
> accomplish via a REST extension.
>
> Finally, I'm also wondering if, in pursuit of expanding Connect's
> out-of-the-box support for dealing with connector metadata, we might want
> to expose created/last-updated times for connector configurations. We
> definitely don't have to do that in this KIP but if you agree that this
> would be useful, we should probably keep it in mind to avoid painting
> ourselves into a corner. This is why I'm thinking of using "tags" instead
> of something a little more generic like "metadata", BTW.
>
> [1] -
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
>
> Cheers,
>
> Chris
>
> On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <mickael.mai...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I created a very small KIP to add a description field to connectors:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
>>
>> Let me know if you have any feedback.
>>
>> Thanks,
>> Mickael
>>
>

Reply via email to