Really interesting ideas and attempts, Kenn/Luke. One inline: On Fri, Oct 19, 2018 at 7:11 PM Lukasz Cwik <[email protected]> wrote:
> I have tried several times to improve the build system and intellij > integration and each attempt ended with little progress when dealing with > vendored code. My latest attempt has been the most promising where I take > the vendored classes/jars and decompile them generating the source that > Intellij can then use. I have a branch[1] that demonstrates the idea. It > works pretty well (and up until a change where we started vendoring gRPC, > was impractical to do. Instructions to try it out are: > > // Clean up any remnants of prior builds/intellij projects > git clean -fdx > // Generated the source for vendored/shaded modules > ./gradlew decompile > > // Remove the "generated" Java sources for protos so they don't conflict with > the decompiled sources. > rm -rf model/pipeline/build/generated/source/proto > rm -rf model/job-management/build/generated/source/proto > rm -rf model/fn-execution/build/generated/source/proto > // Import the project into Intellij, most code completion now works still > some issues with a few classes. > // Note that the Java decompiler doesn't generate valid source so still need > to delegate to Gradle for build/run/test actions > // Other decompilers may do a better/worse job but haven't tried them. > > > The problems that I face are that the generated Java source from the > protos and the decompiled source from the compiled version of that source > post shading are both being imported as content roots and then conflict. > Also, the CFR decompiler isn't producing valid source, if people could try > others and report their mileage, we may find one that works and then we > would be able to use intellij to build/run our code and not need to > delegate all our build/run/test actions to Gradle. > > After all these attempts I have done, vendoring the dependencies outside > of the project seems like a sane approach and unless someone wants to take > a stab at the best progress I have made above, I would go with what Kenn is > suggesting even though it will mean that we will need to perform releases > every time we want to change the version of one of our vendored > dependencies. > > 1: https://github.com/lukecwik/incubator-beam/tree/intellij > > > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <[email protected]> wrote: > >> Another reason to push on this is to get build times down. Once only >> generated proto classes use the shadow plugin we'll cut the build time in >> ~half? And there is no reason to constantly re-vendor. >> >> Kenn >> >> On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <[email protected]> wrote: >> >>> Hi all, >>> >>> A while ago we had pretty good consensus that we should vendor >>> ("pre-shade") specific dependencies, and there's start on it here: >>> https://github.com/apache/beam/tree/master/vendor >>> >>> IntelliJ notes: >>> >>> - With shading, it is hard (impossible?) to step into dependency code >>> in IntelliJ's debugger, because the actual symbol at runtime does not match >>> what is in the external jars >>> >> > Intellij can step through the classes if they were published outside the > project since it can decompile them. The source won't be legible. > Decompiling the source as in my example effectively shows the same issue. > Sorry if this was covered elsewhere in this thread, but it sounds like what we need is to publish `-sources.jar`s alongside our vendored `.jar`s, to allow IntelliJ's normal flow for [aligning .class-files and corresponding sources] to work. If we started with our dependencies' existing `-sources.jar`s, and renamed the files in those JARs to appear to live in the vendored packages (but left the files' contents alone, so they'd contain package- and import-statements with the un-vendored names), my impression is that IntelliJ would still align the .class-files to the intended files and line numbers (even though the bytecode technically wouldn't match the source files). This assumes the bytecode-vendoring doesn't change the line-numbers in the bytecode, which I hope is true, but don't know for sure. Another option would be doing the vendoring at the source-level, rather than the bytecode-level, and publishing vendored `.jar`s and `-sources.jar`s that actually match. These would be indistinguishable from any other library, to IntelliJ (and in general), so all downstream tooling should work. We'd have to maintain a fork of each vendored dep where we check out the release we want, apply a script that renames all package- and import-statements, and publish the results. It would be a bit more involved, but possibly less brittle. If tricking IntelliJ with a modified `-sources.jar` as I described previously would work, we could do that, and not bother with this. > > >> - With vendoring, if the vendored dependencies are part of the project >>> then IntelliJ gets confused because it operates on source, not the produced >>> jars >>> >> > Yes, I tried several ways to get intellij to ignore the source and use the > output jars but no luck. > > >> The second one has a quick fix for most cases*: don't make the vendored >>> dep a subproject, but just separately build and publish it. Since a >>> vendored dependency should change much more infrequently (or if we bake the >>> version into the name, ~never) this means we publish once and save headache >>> and build time forever. >>> >>> WDYT? Have I overlooked something? How about we set up vendored versions >>> of guava, protobuf, grpc, and publish them. We don't have to actually start >>> using them yet, and can do it incrementally. >>> >> > Currently we are relocating code depending on the version string. If the > major version is >= 1, we use only the major version within the package > string and rely on semantic versioning provided by the dependency to not > break people. If the major version is 0, we assume the dependency is > unstable and use the full version as part of the package string during > relocation. > > The downside of using the full version string for relocated packages: > 1) Users will end up with multiple copies of dependencies that differ only > by the minor or patch version increasing the size of their application. > 2) Bumping up the version of a dependency now requires the import > statement in all java files to be updated (not too difficult with some > sed/grep skills) > > The upside of using the full version string in the relocated package: > 1) We don't have to worry about whether a dependency maintains semantic > versioning which means our users won't have to worry about that either. > 2) This increases the odds that a user will load multiple slightly > different versions of the same dependency which is known to be incompatible > in certain situations (e.g. Netty 4.1.25 can't be on the classpath with > Netty 4.1.28 even though they are both shaded due to issues of how JNI with > tcnative works). > > >> >>> (side note: what do other projects like Flink do?) >>> >>> Kenn >>> >>> *for generated proto classes, they need to be altered after being >>> generated so shading happens there, but actually only relocation and the >>> shared vendored dep should work elsewhere in the project >>> >> > We could publish the protos and treat them as "external" dependencies > within the Java projects which would also remove this pain point. >
