Chris, Thanks for the comments!
1. I've updated the script to sync-manifests [--keep-not-found] with your described semantics. 2. I've pushed all of the deprecation decisions to a follow-up KIP that can take place after an intervening release. I hope that this feature has a high enough ROI that the direction of the migration is obvious to users even without a formal deprecation notice. I do think that without a formal deprecation notice that the migration scripts may be in use for a long time, particularly for connector implementations which are not receiving active maintenance. 3. Thanks for looking into the ServiceLoader error handling contract. >From my inspection of the ServiceLoader implementation, it seems to be able to recover from most of the poor packaging scenarios that I'm aware of. I think with or without the deprecation, operators are free to select the ONLY_SCAN mode to work around any errors we don't discover before release. Thanks, Greg On Mon, Feb 6, 2023 at 8:44 AM Chris Egerton <chr...@aiven.io.invalid> wrote: > Hi Greg, > > Thanks for the updates. Unless stated below, I agree with your responses. A > few more thoughts: > > 1. IMO it's not worth it to have separate commands for adding/removing > manifests, mostly because it adds complexity to the tool and might make it > harder for users to understand. I think a single "update-manifests" or > "sync-manifests" command that both adds manifests for found plugins and > removes manifests for unfound plugins by default, with a --keep-unfound > flag to disable removal on an opt-in basis, would make more sense. > > 2. I agree with this point you raise about deprecation: "if we agree that > the old mechanism has some sunset date that we just don't know yet, we > should still communicate to users that the sunset date is somewhere in the > future." However, I'm not certain that releasing the features proposed in > this KIP alone gives us enough confidence in them to sunset the legacy > plugin loading logic, which is why I was hoping we could have at least one > successful release (preferably even two or three) with this feature in > order to identify potential gaps in the design before proceeding. This is > similar to the strategy we took with incremental rebalancing in Connect, > which introduced the "connect.protocol" property with the proposal that it > could be marked deprecated in the next major release. That strategy served > us well, since there were a few kinks in the logic for that feature that > needed to be worked out, and many users continued to rely on the legacy > rebalancing logic as a workaround for those issues. > > 3. I was wondering about error handling with the service loading mechanism > because the Javadocs for the Iterator returned by ServiceLoader::iterator > [1] state that "If an error is thrown then subsequent invocations of the > iterator will make a best effort to locate and instantiate the next > available provider, but in general such recovery cannot be guaranteed." > Upon further reflection, I agree that it's fine for now to leave details on > error handling out of the public contract for this new loading mode, but > only if we also agree to hold off on deprecating the old loading mode, > since there may be cases where worker startup is crippled by an > improperly-packaged plugin when using the service loader mechanism for its > plugins. > > [1] - > > https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#iterator() > > Cheers, > > Chris > > On Tue, Jan 24, 2023 at 10:26 AM Federico Valeri <fedeval...@gmail.com> > wrote: > > > Hi Greg, > > > > On Tue, Jan 24, 2023 at 2:48 AM Greg Harris > > <greg.har...@aiven.io.invalid> wrote: > > > > > > Federico, > > > > > > Thanks for taking a look! > > > > > > > I was just wondering if we should use a better script name like > > > > "connect-convert-to-service-provider.sh" or something like this > > > > > > I agree that the current name is not ideal, while the current name > > > describes part of what it is doing, it does not describe all of what it > > is > > > doing, or why that is important to the caller. > > > I looked around to see what style the other Kafka commands use, and it > > > seems like there's not a standard pattern. Some of the scripts are > > > noun-phrases, and some are verb-phrases. > > > Some take commands like `create`, `format`, etc, and some take options > > like > > > `--generate`, `--execute`, `--verify`. > > > I've updated the command to `bin/connect-plugin-path.sh > > > (list|add-manifests|remove-manifests) --worker-config <worker-config>`, > > let > > > me know if that's better or worse than > `bin/connect-scan-plugin-path.sh`. > > > > > > > maybe add a --dry-run option. > > > > > > Thanks for the suggestion, I've updated the KIP to include a --dry-run > > > option with some typical semantics. > > > > This is much better. Thanks! > > > > I think it's better to not deprecate the add-manifests and > > remove-manifests script sub-commands. When we will remove the > > deprecated plugin discovery modes, one may still have the need to > > convert an old connector release, maybe because it's the only version > > compatible with a third-party external system. > > > > > > > > Chris, > > > > > > I'm glad you liked the personas! Since this KIP requires others' help > to > > > improve the ecosystem, I needed a way to make sure everyone knows what > > role > > > they have to play. I hope I've accomplished that. > > > > > > > 1. Will workers' resiliency to packaging issues be affected when the > > new > > > > scanning behavior is enabled? For example, will a single > > poorly-packaged > > > > plugin prevent other correctly-packaged plugins from being loaded, or > > > > worse, cause the worker to fail? And either way, are there any > details > > > that > > > > we can call out to back up any guarantees we want to provide on that > > > front? > > > > > > The current behavior when encountering packaging issues seems to be to > > log > > > errors and continue, and that will certainly continue for the ONLY_SCAN > > and > > > HYBID_WARN modes. > > > As written, the HYBRID_FAIL mode breaks from this trend and will throw > > > exceptions that crash the worker only when the connectors are missing > > from > > > serviceloader discovery. > > > The behavior for SERVICE_LOAD is not currently defined in the KIP. I > > think > > > we can either keep the log-and-continue behavior whenever possible, or > > > begin to surface errors in a fail-fast model. > > > And if we can't choose between those two, perhaps we can add > > > SERVICE_LOAD_FAIL_FAST, or a plugin.path.discovery.fail.fast=true/false > > to > > > allow users to configure the worker to fail to start up when > plugin.path > > > issues are present. > > > > > > With regards to incorrectly packaged connectors affecting other > correctly > > > packaged ones: I think one of the current behaviors of the Reflections > > > implementation is that certain exceptions can prevent discovery of > other > > > unrelated plugins on the path. > > > I think that this may be able to be improved in all modes, and doesn't > > need > > > the ServiceLoader mechanism to resolve. I'll explore this some more and > > > pursue a fix outside of this KIP. > > > The new mechanism should be as good or better than the Reflections > > > mechanism in this regard. I am not sure if it is necessary to commit to > > the > > > error handling behavior as part of the public API. > > > > > > > 2. I think the description of the "--plugin-location" flag may need > > some > > > > updating? In the middle of the list of bullet points there's "The > > value of > > > > this argument will be a single plugin, which can be any of the > > following:" > > > > but the following points don't appear to be related. > > > > > > Thanks, I think that was a copy-paste error. I have updated the KIP. > > > > > > > 3. (Nit) Could we rename the migration script something like > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path" > with a > > > > "migrate" subcommand? > > > > > > Yes, you and Federico had the same idea :) > > > Please see my feedback above regarding naming. > > > > > > > 4. It's noted that "If a plugin class is removed from the path, the > > > > corresponding shim manifest should also be removed". Did you consider > > > > adding behavior (possibly guarded behind a CLI flag) to find and > remove > > > > manifests for nonexistent plugins? > > > > > > When writing that originally, I thought that the script would only > remove > > > manifests that it had generated, as filtered by some in-file comment or > > by > > > jar filename, which would be an implementation detail. > > > If we were to add a CLI flag to also enable/disable the removal > behavior, > > > we could eliminate that implementation detail and make the removal an > > > all-or-nothing part of the script. > > > > > > I updated the script to accept a `remove-manifests` subcommand which I > > > think captures your suggestion. Therefore, a full sync would include > two > > > invocations: an `add-manifests` and `remove-manifests`. > > > > > > > 5. What's the concrete plan for updating the plugins that ship OOTB > > with > > > > Connect? Will they be given a service loader manifest, a module info > > file, > > > > or both? > > > > > > At a minimum, we need to add service loader manifests in this KIP to be > > > compliant with the new discovery model. > > > This is because Kafka 3.x supports Java 8, and that version of the > > > ServiceLoader can only read manifest files, and not module-info files. > > > Beyond that, I think it becomes a question of scope, and whether we > > should > > > add module-info files for all of the Connect components. > > > I was not planning on doing so within this KIP, and would prefer to > leave > > > that for a follow-up KIP. > > > > > > I have clarified this in the KIP. > > > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins > > that > > > > are found by both the legacy scanning logic and the new service > > > > loader/module info loading logic? > > > > > > In HYBRID_WARN, the control-flow should be the same as SCAN_ONLY, so > the > > > same classes are discovered in the same order. > > > All of the classes that are found via the ServiceLoader mechanism > should > > > only be used by the runtime to generate warnings. > > > I believe that by the nature of the Reflections library, all instances > > > which are discoverable by the ServiceLoader should be discoverable by > > > Reflections, and so we don't need to merge the two discovery methods or > > > handle precedence. > > > > > > > 7. It's a bit strange to immediately deprecate config property values > > and > > > > the migration script in their first release. IMO as long as we have > the > > > > migration script in the project, it doesn't make sense to deprecate > it, > > > and > > > > until we've had at least one successful release with the new loading > > logic > > > > and some confidence in it, we should not deprecate the older logic. I > > > think > > > > a more reasonable pace for the update here could be to deprecate the > > > legacy > > > > scanning logic in 4.0 (changing the default accordingly) and remove > it > > > > entirely in 5.0. > > > > > > I understand that it seems a bit silly to introduce something as > > > deprecated, but I think that's because 'deprecated' is not the same as > > > 'old', and not mutually exclusive with 'new'. > > > The way I understand it, marking something as deprecated and the > process > > of > > > deprecation is about communicating the direction in which an API is > > > evolving, and which methods can be relied upon for the foreseeable > > future. > > > And if we agree that the old mechanism has some sunset date that we > just > > > don't know yet, we should still communicate to users that the sunset > date > > > is somewhere in the future. > > > > > > I've updated the KIP to describe separate follow-on votes for changing > > the > > > default, and removing the scanning behavior, as that was my original > > intent > > > but I don't think that was made clear. > > > I also pushed back the deprecation of the migration sub-commands, since > > > we'd like to usher people towards the migration script instead of > relying > > > on the HYBRID modes. > > > I kept the decision for deprecating the non-SERVICE_LOAD modes in the > > > current vote for now, but if introducing deprecated configurations is > too > > > unpalatable we can move the deprecation to a later vote. > > > > > > Thanks, > > > Greg > > > > > > On Mon, Jan 23, 2023 at 12:49 PM Chris Egerton <chr...@aiven.io.invalid > > > > > wrote: > > > > > > > Hi Greg, > > > > > > > > Thanks for the KIP! This part of Connect has been due for an update > > for a > > > > while, it's nice to see it getting some attention. I especially like > > how > > > > the migration plan is broken down into affected personas; makes > review > > > > easier and, hopefully, makes the feature easier to understand for > > affected > > > > users. > > > > > > > > Here are my initial thoughts: > > > > > > > > 1. Will workers' resiliency to packaging issues be affected when the > > new > > > > scanning behavior is enabled? For example, will a single > > poorly-packaged > > > > plugin prevent other correctly-packaged plugins from being loaded, or > > > > worse, cause the worker to fail? And either way, are there any > details > > that > > > > we can call out to back up any guarantees we want to provide on that > > front? > > > > > > > > 2. I think the description of the "--plugin-location" flag may need > > some > > > > updating? In the middle of the list of bullet points there's "The > > value of > > > > this argument will be a single plugin, which can be any of the > > following:" > > > > but the following points don't appear to be related. > > > > > > > > 3. (Nit) Could we rename the migration script something like > > > > "connect-migrate-plugin-path" or even just "connect-plugin-path" > with a > > > > "migrate" subcommand? > > > > > > > > 4. It's noted that "If a plugin class is removed from the path, the > > > > corresponding shim manifest should also be removed". Did you consider > > > > adding behavior (possibly guarded behind a CLI flag) to find and > remove > > > > manifests for nonexistent plugins? > > > > > > > > 5. What's the concrete plan for updating the plugins that ship OOTB > > with > > > > Connect? Will they be given a service loader manifest, a module info > > file, > > > > or both? > > > > > > > > 6. In the HYBRID_WARN mode, what precedence will we give to plugins > > that > > > > are found by both the legacy scanning logic and the new service > > > > loader/module info loading logic? > > > > > > > > 7. It's a bit strange to immediately deprecate config property values > > and > > > > the migration script in their first release. IMO as long as we have > the > > > > migration script in the project, it doesn't make sense to deprecate > > it, and > > > > until we've had at least one successful release with the new loading > > logic > > > > and some confidence in it, we should not deprecate the older logic. I > > think > > > > a more reasonable pace for the update here could be to deprecate the > > legacy > > > > scanning logic in 4.0 (changing the default accordingly) and remove > it > > > > entirely in 5.0. > > > > > > > > Cheers, > > > > > > > > Chris > > > > > > > > On Mon, Jan 23, 2023 at 3:21 AM Federico Valeri < > fedeval...@gmail.com> > > > > wrote: > > > > > > > > > Hi Greg, this looks like a useful change to me. > > > > > > > > > > I was just wondering if we should use a better script name like > > > > > "connect-convert-to-service-provider.sh" or something like this, > and > > > > > maybe add a --dry-run option. > > > > > > > > > > On Tue, Jan 17, 2023 at 8:45 PM Greg Harris > > > > > <greg.har...@aiven.io.invalid> wrote: > > > > > > > > > > > > Hi all! > > > > > > > > > > > > I'd like to start a discussion about > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery > > > > > > . > > > > > > > > > > > > Thanks! > > > > > > Greg Harris > > > > > > > > > > > >