Actually, I misspoke--a rebalance isn't triggered when an existing connector's config is updated. Assuming the set of workers remains stable, a rebalance is only necessary when a new connector is created, an existing one is deleted, or a new set of task configs is generated.
This weakens the point about unnecessary rebalances when a connector's description is updated, but doesn't entirely address it. Spurious rebalances may still be triggered if a new set of task configs is generated, which for reasons outlined above, is fairly likely. On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <chr...@aiven.io> wrote: > Hi Mickael, > > Thanks for the KIP! > > While it's tempting to add this field to the out-of-the-box connector > config def, I'm a little hesitant, for two reasons. > > 1. Adding this directly to the connector config could have unintended > consequences on the actual data processing by the connector. Any time a > connector's config is modified, the Connector object running for it is > restarted with that new config. In most cases this is a trivial operation > since we have incremental rebalancing enabled by default, the connector can > (and probably should!) generate task configs that are functionally > identical to the ones it last generated, and most (though not all) > Connector classes are fairly lightweight and leave the real work to their > Task class. However, due to KAFKA-9228 [1], it's not just common practice > for connectors to perform transparent passthrough of most of their configs > when generating task configs, it's actually necessary to work around a bug > in the runtime. As a result, tweaking the description of a connector would > be fairly likely to result in a full restart of all of its tasks, in > addition to triggering two rebalances (which may not be so lightweight if > users are still running with eager rebalancing... which, sadly, I've heard > is still happening today). > > 2. The motivation section mentions some information that might go into the > description field, such as the team that owns the connector and emergency > contact info. It seems like this info might benefit from a little more > structure if we're trying to design for programmatic access by GUIs and > CLIs (which I'm assuming is the case, since I can't imagine a human being > getting much use out of the raw output from the GET > /connector-plugins/{name}/config and PUT > /connector-plugins/{name}/config/validate endpoints). This might also make > it easier to add custom validation logic around what kind of information is > present via REST extension. > > > With these thoughts in mind, what do you think about adding a new generic > "tags" object to connectors that can support arbitrary user-provided > key/value pairs? If using the POST /connectors endpoint, it might look > something like this: > > { > "name": "my-connector", > "config": { > "connector.class": "MirrorSource", > "tasks.max": "908" > }, > "tags": { > "team": "data-infra", > "phone": "12345678901", > "email": "din...@example.org" > } > } > > And, to allow users to modify connector tags after one has been created, > we might introduce a /connectors/{name}/tags endpoint with PUT/PATCH/DELETE > verbs that writes tag info for a connector to the config topic, but without > altering the actual connector config (allowing us to skip a rebalance > altogether). > > One other thing we could consider is allowing cluster administrators to > require that connectors are created with a certain set of tags, or even add > per-tag regex validation. They might specify something like > "connector.required.tags = team, phone, email" or > "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But this > is probably overboard for now, especially since it's already possible to > accomplish via a REST extension. > > Finally, I'm also wondering if, in pursuit of expanding Connect's > out-of-the-box support for dealing with connector metadata, we might want > to expose created/last-updated times for connector configurations. We > definitely don't have to do that in this KIP but if you agree that this > would be useful, we should probably keep it in mind to avoid painting > ourselves into a corner. This is why I'm thinking of using "tags" instead > of something a little more generic like "metadata", BTW. > > [1] - > https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration > > Cheers, > > Chris > > On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <mickael.mai...@gmail.com> > wrote: > >> Hi, >> >> I created a very small KIP to add a description field to connectors: >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration >> >> Let me know if you have any feedback. >> >> Thanks, >> Mickael >> >