Thanks Chris for the discussion. Since nobody else expressed interest in this KIP, I'm going to discard it.
On Wed, Mar 8, 2023 at 8:41 PM Chris Egerton <chr...@aiven.io.invalid> wrote: > > Hi Mickael, > > I do sympathize with the desire for a "quick fix". I think your point about > these being problematic to support sums up my hesitation here pretty well, > both with respect to the potential footgun of unintended rebalances (should > users try to do more with this field than we expect), and with making other > similar improvements (i.e., expanded metadata support in Kafka Connect) > more difficult to design correctly for us and to use effectively for users > (especially if we were to deprecate/remove the description field at a later > date). > > It's also worth noting that users can already add a "description" field to > the JSON config for any connector they create; the benefit provided by this > KIP is that they're automatically prompted to do so if they're using a > UI/CLI/etc. that builds on top of the various /connector-plugins endpoints > to find the set of configuration properties that a user can/should populate > before creating a connector. > > I'd welcome thoughts from you and others on whether the tradeoffs here are > worth it; right now I'm still not sure about this one. > > Cheers, > > Chris > > On Tue, Mar 7, 2023 at 6:42 AM Mickael Maison <mickael.mai...@gmail.com> > wrote: > > > H Chris, > > > > Thanks for taking a look. > > > > 1. Yes updating the description can potentially trigger a rebalance. I > > don't expect users to frequently update the description so I thought > > this was acceptable. I've added a note to the KIP to mention it. > > > > 2. The tags model you described could be interesting but it looks > > pretty involved with multiple new endpoints and brand new concepts. > > > > With this KIP, I really took the most basic approach and proposed the > > simplest set of changes that could get in the next release and, I > > think, immediately bring benefits. However sometimes this is not the > > best approach as a "quick fix" could end up problematic to support in > > the future. The drawbacks (new connector config field + causing a > > rebalance on update) look relatively benign to me so I thought this > > could be an acceptable proposal. > > > > Thanks, > > Mickael > > > > > > > > On Thu, Feb 23, 2023 at 2:45 PM Chris Egerton <chr...@aiven.io.invalid> > > wrote: > > > > > > Actually, I misspoke--a rebalance isn't triggered when an existing > > > connector's config is updated. Assuming the set of workers remains > > stable, > > > a rebalance is only necessary when a new connector is created, an > > existing > > > one is deleted, or a new set of task configs is generated. > > > > > > This weakens the point about unnecessary rebalances when a connector's > > > description is updated, but doesn't entirely address it. Spurious > > > rebalances may still be triggered if a new set of task configs is > > > generated, which for reasons outlined above, is fairly likely. > > > > > > On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <chr...@aiven.io> wrote: > > > > > > > Hi Mickael, > > > > > > > > Thanks for the KIP! > > > > > > > > While it's tempting to add this field to the out-of-the-box connector > > > > config def, I'm a little hesitant, for two reasons. > > > > > > > > 1. Adding this directly to the connector config could have unintended > > > > consequences on the actual data processing by the connector. Any time a > > > > connector's config is modified, the Connector object running for it is > > > > restarted with that new config. In most cases this is a trivial > > operation > > > > since we have incremental rebalancing enabled by default, the > > connector can > > > > (and probably should!) generate task configs that are functionally > > > > identical to the ones it last generated, and most (though not all) > > > > Connector classes are fairly lightweight and leave the real work to > > their > > > > Task class. However, due to KAFKA-9228 [1], it's not just common > > practice > > > > for connectors to perform transparent passthrough of most of their > > configs > > > > when generating task configs, it's actually necessary to work around a > > bug > > > > in the runtime. As a result, tweaking the description of a connector > > would > > > > be fairly likely to result in a full restart of all of its tasks, in > > > > addition to triggering two rebalances (which may not be so lightweight > > if > > > > users are still running with eager rebalancing... which, sadly, I've > > heard > > > > is still happening today). > > > > > > > > 2. The motivation section mentions some information that might go into > > the > > > > description field, such as the team that owns the connector and > > emergency > > > > contact info. It seems like this info might benefit from a little more > > > > structure if we're trying to design for programmatic access by GUIs and > > > > CLIs (which I'm assuming is the case, since I can't imagine a human > > being > > > > getting much use out of the raw output from the GET > > > > /connector-plugins/{name}/config and PUT > > > > /connector-plugins/{name}/config/validate endpoints). This might also > > make > > > > it easier to add custom validation logic around what kind of > > information is > > > > present via REST extension. > > > > > > > > > > > > With these thoughts in mind, what do you think about adding a new > > generic > > > > "tags" object to connectors that can support arbitrary user-provided > > > > key/value pairs? If using the POST /connectors endpoint, it might look > > > > something like this: > > > > > > > > { > > > > "name": "my-connector", > > > > "config": { > > > > "connector.class": "MirrorSource", > > > > "tasks.max": "908" > > > > }, > > > > "tags": { > > > > "team": "data-infra", > > > > "phone": "12345678901", > > > > "email": "din...@example.org" > > > > } > > > > } > > > > > > > > And, to allow users to modify connector tags after one has been > > created, > > > > we might introduce a /connectors/{name}/tags endpoint with > > PUT/PATCH/DELETE > > > > verbs that writes tag info for a connector to the config topic, but > > without > > > > altering the actual connector config (allowing us to skip a rebalance > > > > altogether). > > > > > > > > One other thing we could consider is allowing cluster administrators to > > > > require that connectors are created with a certain set of tags, or > > even add > > > > per-tag regex validation. They might specify something like > > > > "connector.required.tags = team, phone, email" or > > > > "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But > > this > > > > is probably overboard for now, especially since it's already possible > > to > > > > accomplish via a REST extension. > > > > > > > > Finally, I'm also wondering if, in pursuit of expanding Connect's > > > > out-of-the-box support for dealing with connector metadata, we might > > want > > > > to expose created/last-updated times for connector configurations. We > > > > definitely don't have to do that in this KIP but if you agree that this > > > > would be useful, we should probably keep it in mind to avoid painting > > > > ourselves into a corner. This is why I'm thinking of using "tags" > > instead > > > > of something a little more generic like "metadata", BTW. > > > > > > > > [1] - > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration > > > > > > > > Cheers, > > > > > > > > Chris > > > > > > > > On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison < > > mickael.mai...@gmail.com> > > > > wrote: > > > > > > > >> Hi, > > > >> > > > >> I created a very small KIP to add a description field to connectors: > > > >> > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration > > > >> > > > >> Let me know if you have any feedback. > > > >> > > > >> Thanks, > > > >> Mickael > > > >> > > > > > >