Thanks Ewen. I'm replying inline as well. On Tue, May 2, 2017 at 11:24 AM, Ewen Cheslack-Postava <e...@confluent.io> wrote:
> Thanks for the KIP. > > A few responses inline, followed by additional comments. > > On Mon, May 1, 2017 at 9:50 PM, Konstantine Karantasis < > konstant...@confluent.io> wrote: > > > Gwen, Randall thank you for your very insightful observations. I'm glad > you > > find this first draft to be an adequate platform for discussion. > > > > I'll attempt replying to your comments in order. > > > > Gwen, I also debated exactly the same two options: a) interpreting > absence > > of module path as a user's intention to turn off isolation and b) > > explicitly using an additional boolean property. A few reasons why I went > > with b) in this first draft are: > > 1) As Randall mentions, to leave the option of using a default value > open. > > If not immediately in the first version of isolation, maybe in the > future. > > 2) I didn't like the implicit character of the choice of interpreting an > > empty string as a clear intention to turn isolation off by the user. Half > > the time could be just that users forget to set a location, although > they'd > > like to use class loading isolation. > > 3) There's a slim possibility that in rare occasions a user might want to > > avoid even the slightest increase in memory consumption due to class > > loading duplication. I admit this should be very rare, but given the > other > > concerns and that we would really like to keep the isolation > implementation > > simple, the option to turn off this feature by using only one additional > > config property might not seem too excessive. At least at the start of > this > > discussion. > > 4) Debugging during development might be simpler in some cases. > > 5) Finally, as you mention, this could allow for smoother upgrades. > > > > I'm not sure any of these keep you from removing the extra config. Is there > any reason you couldn't have clean support for relying on the CLASSPATH > while still supporting the classloaders? Then getting people onto the new > classloaders does require documentation for how to install connectors, but > that's pretty minimal. And we don't break existing installations where > people are just adding to the CLASSPATH. It seems like this: > > 1. Allows you to set a default. Isolation is always enabled, but we won't > include any paths/directories we already use. Setting a default just > requires specifying a new location where we'd hold these directories. > 2. It doesn't require the implicit choice -- you actually never turn off > isolation, but still support the regular CLASSPATH with an empty list of > isolated loaders > 3. The user can still use CLASSPATH if they want to minimize classloader > overhead > 4. Debugging can still use CLASSPATH > 5. Upgrades just work. > Falling back to CLASSPATH for non-isolated mode makes sense. The extra config property was suggested proactively, as well as for clarity and handling of defaults. But it's much better if we can do without it. Will be removed. > > > > Randall, regarding your comments: > > 1) To keep its focus narrow, this KIP, as well as the first > implementation > > of isolation in Connect, assume filesystem based discovery. With careful > > implementation, transitioning to discovery schemes that support broader > > URIs I believe should be easy in the future. > > > > Maybe just mention a couple of quick examples in the KIP. When described > inline it might be more obvious that it will extend cleanly. > > There's an example for a filesystem-based structure. I will enhance it. > > 2) The example you give makes a good point. However I'm inclined to say > > that such cases should be addressed more as exceptions rather than as > being > > the common case. Therefore, I wouldn't see all dependencies imported by > the > > framework as required to be filtered out, because in that case we lose > the > > advantage of isolation between the framework and the connectors (and we > are > > left only with isolation between connectors). > > 3) I tried to abstract implementation details in this the KIP, but you are > > right. Even though filtering here is mainly used semantically rather than > > literally, it gives an implementation hint that we could avoid. > > > > I think we're missing another option -- don't do filtering and require that > those dependencies are correctly filtered out of the modules. If we want to > be nicer about this, we could also detect maybe 2 or 3 classes while > scanning for Connectors/Converters/Transformations that indicate the > classloader has jars that it shouldn't and warn about it. I can't think of > that many that would be an issue -- basically connect-api, connect-runtime > if they really mess it up, and maybe slf4j. > > This sounds a bit more strict, which is not necessarily a bad thing. Still, it seems to also keep the subject under the category of implementation decisions. > 4) In the same spirit as in 3) I believe we should reserve enough > > flexibility to the implementation to discover and load classes, when they > > appear in multiple locations under the general module location. > > > > > And a couple of addition comments: > > - module.path should be module.paths if it accepts multiple paths > The configuration property was named in a way that resembles classpath. I felt this might offer some intuition, even if the word "path" is too filesystem oriented. Users are used to give a list of paths in classpath, so I didn't find singular too confusing. Additionally, it will work when there's only one "path", or I should say URI, given. But I don't feel very strong about the name of this property. (Alternatives that I tried such as module.uris, module.locations didn't seem satisfying) > - I think the description of the module paths is a bit confusing because I > think for simpler configuration we actually want to specify the *parent* > directory of module paths. I definitely prefer this since it's simpler > although I am not certain how it will mix with future extensions to other > URL handlers (where a zip file or http URL wouldn't do the same discovery > within subdirectories) > My impression is that the 2nd paragraph below the introduction of module.path is where the "parent" convention is attempted to be explained. I will attempt to improve this text, but I feel that, again, the filesystem convention will follow the generic definition of the property (a list of locations/paths). -Konstantine > > -Ewen > > > > Thanks again! Let me know what you think. > > Konstantine > > > > > > On Mon, May 1, 2017 at 10:12 AM, Randall Hauch <rha...@gmail.com> wrote: > > > > > Very nice work, Konstantine. Conflicting dependencies of connectors is > > > indeed a big issue that makes it hard to manage installed connectors. > > > > > > I do like Gwen's idea about removing the 'module.isolation.enabled' > > > property. However, I would have anticipated always using classpath > > > isolation for *only* those components registered under the module path > > and > > > not really for anything else already on the normal classpath. So, > people > > > could continue to place custom connector JARs onto the classpath, > though > > > this would become deprecated in favor of installing custom connector > > JARs / > > > modules via the module path. This keeps configuration simple, gives > > people > > > time to migrate, but let's people that need classpath isolation get it > to > > > install a variety of connectors each with their dependencies that > > > potentially conflict with other components. > > > > > > The challenge is whether there should be a default for 'module.path'. > > > Ideally there would be so that users know where they can install their > > > connectors. However, I suspect that this might be difficult to do > unless > > it > > > can make use of system properties such as "${kafka.home}" so that > > relative > > > directories can be specified. > > > > > > A few other questions/comments: > > > > > > 1) Does the KIP have to specify how are components / modules installed, > > > discovered, or recognized by Kafka Connect? Or perhaps the KIP needs to > > > just specify the semantics of the file system module path (e.g., the > > > directories below those specified in the module path are to be unique > and > > > identify an installed component). > > > > > > 2) Will the module classloader filtering also have to exclude Kafka > > Connect > > > dependencies? The only one that I can think of is the SLF4J API, which > > > can't be loaded from the module's classloader if the connector is to > send > > > its log messages to the same logging system. > > > > > > 3) Rather than specify filtering, would be it a bit more flexible to > > simply > > > say that the implementation will need to ensure that Java, Kafka > Connect, > > > and other third party APIs (e.g., SLF4J API) will not be loaded from > the > > > module classloaders? It'd be better to avoid specifying how it will be > > > done, just in case the implementation needs to evolve or use a > different > > > technique (e.g., load the Java and public Kafka Connect APIs via one > > > classloader that is reused and that always appears before the module > > > classloader, while Kafka Connect implementation JARs appear after the > > > component's classloader. > > > > > > 4) Perhaps to address #2 and #3 above, perhaps the KIP could explicitly > > > specify the classloader order for a deployed connector. For example, > > > 'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', ..., > > > 'kafka-connect-impls', where 'connector-module' is the classloader for > > the > > > (first) module where the connector is found, 'smt-module-1' is the > > > classloader for the (first) module where the first SMT class is found > (if > > > specified and found in a separate module), 'smt-module-2' is the > > > classloader .... Might also need to say that the KIP does not specify > how > > > the implementation will pick the module if a specified class if found > in > > > more than one module. > > > > > > Thoughts? > > > > > > Randall > > > > > > On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira <g...@confluent.io> > wrote: > > > > > > > Hi Konstantine, > > > > > > > > Thank you so much for driving this! The connector classpath mess is > > > driving > > > > me nuts (or worse, driving me to use Docker). > > > > > > > > I like the proposal for micro-benchmarks to test the context > switching > > > > overhead. > > > > > > > > I have a difficult time figuring out the module.isolation.enabled. > > > > Especially with a default to false. I can't think of a reason that > > anyone > > > > will not want classpath isolation. "No! I want my connectors to mess > up > > > > each other's dependencies" said no one ever. > > > > > > > > So it looks like this is mostly for upgrade purpose? Because the > > initial > > > > upgrade will not have the module.path set and therefore classpath > > > isolation > > > > will simply not work by default? > > > > > > > > In that case, why don't we simply use the existence of non-empty > > > > module.path as an indicator of whether isolation should work or not? > > seem > > > > simpler and intuitive to me. > > > > > > > > Thanks! > > > > > > > > Gwen > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis < > > > > konstant...@confluent.io> wrote: > > > > > > > > > * Because of KIP number collision, please disregard my previous KIP > > > > > announcement and use this thread for discussion instead * > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > we aim to address dependency conflicts in Kafka Connect soon by > > > applying > > > > > class loading isolation. > > > > > > > > > > Feel free to take a look at KIP-146 here: > > > > > > > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > 146+-+Classloading+Isolation+in+Connect > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > 146+-+Classloading+Isolation+in+Connect>* > > > > > > > > > > which describes minimal required changes to public interfaces and > the > > > > > general implementation approach. > > > > > > > > > > This is a much wanted feature for Kafka Connect. Your feedback is > > > highly > > > > > appreciated. > > > > > > > > > > -Konstantine > > > > > > > > > > > > > > > > > > > > > -- > > > > *Gwen Shapira* > > > > Product Manager | Confluent > > > > 650.450.2760 | @gwenshap > > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog > > > > <http://www.confluent.io/blog> > > > > > > > > > >