Hi Greg, Thanks for the KIP. Overall I think using service loading would be a worthwhile improvement.
1. "Both of these are well-established Java interfaces for which public documentation should be readily available." I see no harm in adding a well chosen link or two just to save readers unfamiliar with service loading from having to search for documentation. 2. "plugin.path.discovery" is a bit of an odd name, since it's not discovering plugin _paths_, rather it's discovering plugins. So perhaps plugin.discovery, or plugin.discovery.policy would be a better name. 3. I wondered if we really needed "ONLY_SCAN". It seems like the sort of thing where someone silences the warnings then forgets about it until one day they (or worse, someone else) gets the pleasure of upgrading and finding they can't find their plugins. I guess the proposed roll out, over multiple major releases, makes this less likely. Non-the-less, logged warnings seem like a small price to pay overall, and would make it harder for users to ignore the problem. We should be encouraging users to use the migration script to silence warnings and adopt SERVICE_LOAD. That way they get the benefits sooner than Kafka 5.0. 4. As described, connect-plugin-path.sh can modify jar files in-place. Have you tested/considered whether this will work for sealed jars? I know they're not common, but they are part of the Java platform so we can't rule out that they're being used. I _think_ service loading should still work even if you wrote a new shim jar file which just contained the service files needed for loading (alongside the original unmodified jar containing the plugins); if that's right then there's no need to actually modify existing jars. Thanks again, Tom On Wed, 8 Feb 2023 at 21:45, Chris Egerton <chr...@aiven.io.invalid> wrote: > Hi Greg, > > Thanks for the updates. This is looking great. A few minor questions: > > 1. The description for the list command in the plugin path CLI states that > it'll show "Whether the class is discoverable via scanning". Are there any > cases where a plugin found by the CLI won't be discoverable via scanning? > > 2. Why are we treating module info files differently than service loader > manifests (where the latter will be removed for not-found plugins on an > opt-out basis, but the former are left unmodified no matter what)? Basing > this off of this sentence in the KIP: "If a plugin declares an > implementation via a module-info.java which is not loadable, the > module-info.java will not be modified."; let me know if I'm > misinterpreting. > > Cheers, > > Chris > > On Tue, Feb 7, 2023 at 4:32 PM Greg Harris <greg.har...@aiven.io.invalid> > wrote: > > > Chris, > > > > Thanks for the comments! > > > > 1. I've updated the script to sync-manifests [--keep-not-found] with your > > described semantics. > > > > 2. I've pushed all of the deprecation decisions to a follow-up KIP that > can > > take place after an intervening release. > > I hope that this feature has a high enough ROI that the direction of the > > migration is obvious to users even without a formal deprecation notice. > > I do think that without a formal deprecation notice that the migration > > scripts may be in use for a long time, particularly for connector > > implementations which are not receiving active maintenance. > > > > 3. Thanks for looking into the ServiceLoader error handling contract. > > From my inspection of the ServiceLoader implementation, it seems to be > able > > to recover from most of the poor packaging scenarios that I'm aware of. > > I think with or without the deprecation, operators are free to select the > > ONLY_SCAN mode to work around any errors we don't discover before > release. > > > > Thanks, > > Greg > > > > On Mon, Feb 6, 2023 at 8:44 AM Chris Egerton <chr...@aiven.io.invalid> > > wrote: > > > > > Hi Greg, > > > > > > Thanks for the updates. Unless stated below, I agree with your > > responses. A > > > few more thoughts: > > > > > > 1. IMO it's not worth it to have separate commands for adding/removing > > > manifests, mostly because it adds complexity to the tool and might make > > it > > > harder for users to understand. I think a single "update-manifests" or > > > "sync-manifests" command that both adds manifests for found plugins and > > > removes manifests for unfound plugins by default, with a --keep-unfound > > > flag to disable removal on an opt-in basis, would make more sense. > > > > > > 2. I agree with this point you raise about deprecation: "if we agree > that > > > the old mechanism has some sunset date that we just don't know yet, we > > > should still communicate to users that the sunset date is somewhere in > > the > > > future." However, I'm not certain that releasing the features proposed > in > > > this KIP alone gives us enough confidence in them to sunset the legacy > > > plugin loading logic, which is why I was hoping we could have at least > > one > > > successful release (preferably even two or three) with this feature in > > > order to identify potential gaps in the design before proceeding. This > is > > > similar to the strategy we took with incremental rebalancing in > Connect, > > > which introduced the "connect.protocol" property with the proposal that > > it > > > could be marked deprecated in the next major release. That strategy > > served > > > us well, since there were a few kinks in the logic for that feature > that > > > needed to be worked out, and many users continued to rely on the legacy > > > rebalancing logic as a workaround for those issues. > > > > > > 3. I was wondering about error handling with the service loading > > mechanism > > > because the Javadocs for the Iterator returned by > ServiceLoader::iterator > > > [1] state that "If an error is thrown then subsequent invocations of > the > > > iterator will make a best effort to locate and instantiate the next > > > available provider, but in general such recovery cannot be guaranteed." > > > Upon further reflection, I agree that it's fine for now to leave > details > > on > > > error handling out of the public contract for this new loading mode, > but > > > only if we also agree to hold off on deprecating the old loading mode, > > > since there may be cases where worker startup is crippled by an > > > improperly-packaged plugin when using the service loader mechanism for > > its > > > plugins. > > > > > > [1] - > > > > > > > > > https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#iterator() > > > > > > Cheers, > > > > > > Chris > > > > > > On Tue, Jan 24, 2023 at 10:26 AM Federico Valeri <fedeval...@gmail.com > > > > > wrote: > > > > > > > Hi Greg, > > > > > > > > On Tue, Jan 24, 2023 at 2:48 AM Greg Harris > > > > <greg.har...@aiven.io.invalid> wrote: > > > > > > > > > > Federico, > > > > > > > > > > Thanks for taking a look! > > > > > > > > > > > I was just wondering if we should use a better script name like > > > > > > "connect-convert-to-service-provider.sh" or something like this > > > > > > > > > > I agree that the current name is not ideal, while the current name > > > > > describes part of what it is doing, it does not describe all of > what > > it > > > > is > > > > > doing, or why that is important to the caller. > > > > > I looked around to see what style the other Kafka commands use, and > > it > > > > > seems like there's not a standard pattern. Some of the scripts are > > > > > noun-phrases, and some are verb-phrases. > > > > > Some take commands like `create`, `format`, etc, and some take > > options > > > > like > > > > > `--generate`, `--execute`, `--verify`. > > > > > I've updated the command to `bin/connect-plugin-path.sh > > > > > (list|add-manifests|remove-manifests) --worker-config > > <worker-config>`, > > > > let > > > > > me know if that's better or worse than > > > `bin/connect-scan-plugin-path.sh`. > > > > > > > > > > > maybe add a --dry-run option. > > > > > > > > > > Thanks for the suggestion, I've updated the KIP to include a > > --dry-run > > > > > option with some typical semantics. > > > > > > > > This is much better. Thanks! > > > > > > > > I think it's better to not deprecate the add-manifests and > > > > remove-manifests script sub-commands. When we will remove the > > > > deprecated plugin discovery modes, one may still have the need to > > > > convert an old connector release, maybe because it's the only version > > > > compatible with a third-party external system. > > > > > > > > > > > > > > Chris, > > > > > > > > > > I'm glad you liked the personas! Since this KIP requires others' > help > > > to > > > > > improve the ecosystem, I needed a way to make sure everyone knows > > what > > > > role > > > > > they have to play. I hope I've accomplished that. > > > > > > > > > > > 1. Will workers' resiliency to packaging issues be affected when > > the > > > > new > > > > > > scanning behavior is enabled? For example, will a single > > > > poorly-packaged > > > > > > plugin prevent other correctly-packaged plugins from being > loaded, > > or > > > > > > worse, cause the worker to fail? And either way, are there any > > > details > > > > > that > > > > > > we can call out to back up any guarantees we want to provide on > > that > > > > > front? > > > > > > > > > > The current behavior when encountering packaging issues seems to be > > to > > > > log > > > > > errors and continue, and that will certainly continue for the > > ONLY_SCAN > > > > and > > > > > HYBID_WARN modes. > > > > > As written, the HYBRID_FAIL mode breaks from this trend and will > > throw > > > > > exceptions that crash the worker only when the connectors are > missing > > > > from > > > > > serviceloader discovery. > > > > > The behavior for SERVICE_LOAD is not currently defined in the KIP. > I > > > > think > > > > > we can either keep the log-and-continue behavior whenever possible, > > or > > > > > begin to surface errors in a fail-fast model. > > > > > And if we can't choose between those two, perhaps we can add > > > > > SERVICE_LOAD_FAIL_FAST, or a > > plugin.path.discovery.fail.fast=true/false > > > > to > > > > > allow users to configure the worker to fail to start up when > > > plugin.path > > > > > issues are present. > > > > > > > > > > With regards to incorrectly packaged connectors affecting other > > > correctly > > > > > packaged ones: I think one of the current behaviors of the > > Reflections > > > > > implementation is that certain exceptions can prevent discovery of > > > other > > > > > unrelated plugins on the path. > > > > > I think that this may be able to be improved in all modes, and > > doesn't > > > > need > > > > > the ServiceLoader mechanism to resolve. I'll explore this some more > > and > > > > > pursue a fix outside of this KIP. > > > > > The new mechanism should be as good or better than the Reflections > > > > > mechanism in this regard. I am not sure if it is necessary to > commit > > to > > > > the > > > > > error handling behavior as part of the public API. > > > > > > > > > > > 2. I think the description of the "--plugin-location" flag may > need > > > > some > > > > > > updating? In the middle of the list of bullet points there's "The > > > > value of > > > > > > this argument will be a single plugin, which can be any of the > > > > following:" > > > > > > but the following points don't appear to be related. > > > > > > > > > > Thanks, I think that was a copy-paste error. I have updated the > KIP. > > > > > > > > > > > 3. (Nit) Could we rename the migration script something like > > > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path" > > > with a > > > > > > "migrate" subcommand? > > > > > > > > > > Yes, you and Federico had the same idea :) > > > > > Please see my feedback above regarding naming. > > > > > > > > > > > 4. It's noted that "If a plugin class is removed from the path, > the > > > > > > corresponding shim manifest should also be removed". Did you > > consider > > > > > > adding behavior (possibly guarded behind a CLI flag) to find and > > > remove > > > > > > manifests for nonexistent plugins? > > > > > > > > > > When writing that originally, I thought that the script would only > > > remove > > > > > manifests that it had generated, as filtered by some in-file > comment > > or > > > > by > > > > > jar filename, which would be an implementation detail. > > > > > If we were to add a CLI flag to also enable/disable the removal > > > behavior, > > > > > we could eliminate that implementation detail and make the removal > an > > > > > all-or-nothing part of the script. > > > > > > > > > > I updated the script to accept a `remove-manifests` subcommand > which > > I > > > > > think captures your suggestion. Therefore, a full sync would > include > > > two > > > > > invocations: an `add-manifests` and `remove-manifests`. > > > > > > > > > > > 5. What's the concrete plan for updating the plugins that ship > OOTB > > > > with > > > > > > Connect? Will they be given a service loader manifest, a module > > info > > > > file, > > > > > > or both? > > > > > > > > > > At a minimum, we need to add service loader manifests in this KIP > to > > be > > > > > compliant with the new discovery model. > > > > > This is because Kafka 3.x supports Java 8, and that version of the > > > > > ServiceLoader can only read manifest files, and not module-info > > files. > > > > > Beyond that, I think it becomes a question of scope, and whether we > > > > should > > > > > add module-info files for all of the Connect components. > > > > > I was not planning on doing so within this KIP, and would prefer to > > > leave > > > > > that for a follow-up KIP. > > > > > > > > > > I have clarified this in the KIP. > > > > > > > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to > plugins > > > > that > > > > > > are found by both the legacy scanning logic and the new service > > > > > > loader/module info loading logic? > > > > > > > > > > In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY, > so > > > the > > > > > same classes are discovered in the same order. > > > > > All of the classes that are found via the ServiceLoader mechanism > > > should > > > > > only be used by the runtime to generate warnings. > > > > > I believe that by the nature of the Reflections library, all > > instances > > > > > which are discoverable by the ServiceLoader should be discoverable > by > > > > > Reflections, and so we don't need to merge the two discovery > methods > > or > > > > > handle precedence. > > > > > > > > > > > 7. It's a bit strange to immediately deprecate config property > > values > > > > and > > > > > > the migration script in their first release. IMO as long as we > have > > > the > > > > > > migration script in the project, it doesn't make sense to > deprecate > > > it, > > > > > and > > > > > > until we've had at least one successful release with the new > > loading > > > > logic > > > > > > and some confidence in it, we should not deprecate the older > > logic. I > > > > > think > > > > > > a more reasonable pace for the update here could be to deprecate > > the > > > > > legacy > > > > > > scanning logic in 4.0 (changing the default accordingly) and > remove > > > it > > > > > > entirely in 5.0. > > > > > > > > > > I understand that it seems a bit silly to introduce something as > > > > > deprecated, but I think that's because 'deprecated' is not the same > > as > > > > > 'old', and not mutually exclusive with 'new'. > > > > > The way I understand it, marking something as deprecated and the > > > process > > > > of > > > > > deprecation is about communicating the direction in which an API is > > > > > evolving, and which methods can be relied upon for the foreseeable > > > > future. > > > > > And if we agree that the old mechanism has some sunset date that we > > > just > > > > > don't know yet, we should still communicate to users that the > sunset > > > date > > > > > is somewhere in the future. > > > > > > > > > > I've updated the KIP to describe separate follow-on votes for > > changing > > > > the > > > > > default, and removing the scanning behavior, as that was my > original > > > > intent > > > > > but I don't think that was made clear. > > > > > I also pushed back the deprecation of the migration sub-commands, > > since > > > > > we'd like to usher people towards the migration script instead of > > > relying > > > > > on the HYBRID modes. > > > > > I kept the decision for deprecating the non-SERVICE_LOAD modes in > the > > > > > current vote for now, but if introducing deprecated configurations > is > > > too > > > > > unpalatable we can move the deprecation to a later vote. > > > > > > > > > > Thanks, > > > > > Greg > > > > > > > > > > On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton > > <chr...@aiven.io.invalid > > > > > > > > > wrote: > > > > > > > > > > > Hi Greg, > > > > > > > > > > > > Thanks for the KIP! This part of Connect has been due for an > update > > > > for a > > > > > > while, it's nice to see it getting some attention. I especially > > like > > > > how > > > > > > the migration plan is broken down into affected personas; makes > > > review > > > > > > easier and, hopefully, makes the feature easier to understand for > > > > affected > > > > > > users. > > > > > > > > > > > > Here are my initial thoughts: > > > > > > > > > > > > 1. Will workers' resiliency to packaging issues be affected when > > the > > > > new > > > > > > scanning behavior is enabled? For example, will a single > > > > poorly-packaged > > > > > > plugin prevent other correctly-packaged plugins from being > loaded, > > or > > > > > > worse, cause the worker to fail? And either way, are there any > > > details > > > > that > > > > > > we can call out to back up any guarantees we want to provide on > > that > > > > front? > > > > > > > > > > > > 2. I think the description of the "--plugin-location" flag may > need > > > > some > > > > > > updating? In the middle of the list of bullet points there's "The > > > > value of > > > > > > this argument will be a single plugin, which can be any of the > > > > following:" > > > > > > but the following points don't appear to be related. > > > > > > > > > > > > 3. (Nit) Could we rename the migration script something like > > > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path" > > > with a > > > > > > "migrate" subcommand? > > > > > > > > > > > > 4. It's noted that "If a plugin class is removed from the path, > the > > > > > > corresponding shim manifest should also be removed". Did you > > consider > > > > > > adding behavior (possibly guarded behind a CLI flag) to find and > > > remove > > > > > > manifests for nonexistent plugins? > > > > > > > > > > > > 5. What's the concrete plan for updating the plugins that ship > OOTB > > > > with > > > > > > Connect? Will they be given a service loader manifest, a module > > info > > > > file, > > > > > > or both? > > > > > > > > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to > plugins > > > > that > > > > > > are found by both the legacy scanning logic and the new service > > > > > > loader/module info loading logic? > > > > > > > > > > > > 7. It's a bit strange to immediately deprecate config property > > values > > > > and > > > > > > the migration script in their first release. IMO as long as we > have > > > the > > > > > > migration script in the project, it doesn't make sense to > deprecate > > > > it, and > > > > > > until we've had at least one successful release with the new > > loading > > > > logic > > > > > > and some confidence in it, we should not deprecate the older > > logic. I > > > > think > > > > > > a more reasonable pace for the update here could be to deprecate > > the > > > > legacy > > > > > > scanning logic in 4.0 (changing the default accordingly) and > remove > > > it > > > > > > entirely in 5.0. > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Chris > > > > > > > > > > > > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri < > > > fedeval...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi Greg, this looks like a useful change to me. > > > > > > > > > > > > > > I was just wondering if we should use a better script name like > > > > > > > "connect-convert-to-service-provider.sh" or something like > this, > > > and > > > > > > > maybe add a --dry-run option. > > > > > > > > > > > > > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris > > > > > > > <greg.har...@aiven.io.invalid> wrote: > > > > > > > > > > > > > > > > Hi all! > > > > > > > > > > > > > > > > I'd like to start a discussion about > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery > > > > > > > > . > > > > > > > > > > > > > > > > Thanks! > > > > > > > > Greg Harris > > > > > > > > > > > > > > > > > > > > > > >