Re: [DISCUSS] Publish vendored dependencies independently

Ryan Williams Sat, 20 Oct 2018 08:22:11 -0700

Really interesting ideas and attempts, Kenn/Luke. One inline:

On Fri, Oct 19, 2018 at 7:11 PM Lukasz Cwik <[email protected]> wrote:


> I have tried several times to improve the build system and intellij
> integration and each attempt ended with little progress when dealing with
> vendored code. My latest attempt has been the most promising where I take
> the vendored classes/jars and decompile them generating the source that
> Intellij can then use. I have a branch[1] that demonstrates the idea. It
> works pretty well (and up until a change where we started vendoring gRPC,
> was impractical to do. Instructions to try it out are:
>
> // Clean up any remnants of prior builds/intellij projects
> git clean -fdx
> // Generated the source for vendored/shaded modules
> ./gradlew decompile
>
> // Remove the "generated" Java sources for protos so they don't conflict with 
> the decompiled sources.
> rm -rf model/pipeline/build/generated/source/proto
> rm -rf model/job-management/build/generated/source/proto
> rm -rf model/fn-execution/build/generated/source/proto
> // Import the project into Intellij, most code completion now works still 
> some issues with a few classes.
> // Note that the Java decompiler doesn't generate valid source so still need 
> to delegate to Gradle for build/run/test actions
> // Other decompilers may do a better/worse job but haven't tried them.
>
>
> The problems that I face are that the generated Java source from the
> protos and the decompiled source from the compiled version of that source
> post shading are both being imported as content roots and then conflict.
> Also, the CFR decompiler isn't producing valid source, if people could try
> others and report their mileage, we may find one that works and then we
> would be able to use intellij to build/run our code and not need to
> delegate all our build/run/test actions to Gradle.
>
> After all these attempts I have done, vendoring the dependencies outside
> of the project seems like a sane approach and unless someone wants to take
> a stab at the best progress I have made above, I would go with what Kenn is
> suggesting even though it will mean that we will need to perform releases
> every time we want to change the version of one of our vendored
> dependencies.
>
> 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>
>
> On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <[email protected]> wrote:
>
>> Another reason to push on this is to get build times down. Once only
>> generated proto classes use the shadow plugin we'll cut the build time in
>> ~half? And there is no reason to constantly re-vendor.
>>
>> Kenn
>>
>> On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> A while ago we had pretty good consensus that we should vendor
>>> ("pre-shade") specific dependencies, and there's start on it here:
>>> https://github.com/apache/beam/tree/master/vendor
>>>
>>> IntelliJ notes:
>>>
>>>  - With shading, it is hard (impossible?) to step into dependency code
>>> in IntelliJ's debugger, because the actual symbol at runtime does not match
>>> what is in the external jars
>>>
>>
> Intellij can step through the classes if they were published outside the
> project since it can decompile them. The source won't be legible.
> Decompiling the source as in my example effectively shows the same issue.
>

Sorry if this was covered elsewhere in this thread, but it sounds like what
we need is to publish `-sources.jar`s alongside our vendored `.jar`s, to
allow IntelliJ's normal flow for [aligning .class-files and corresponding
sources] to work.

If we started with our dependencies' existing `-sources.jar`s, and renamed
the files in those JARs to appear to live in the vendored packages (but
left the files' contents alone, so they'd contain package- and
import-statements with the un-vendored names), my impression is that
IntelliJ would still align the .class-files to the intended files and line
numbers (even though the bytecode technically wouldn't match the source
files). This assumes the bytecode-vendoring doesn't change the line-numbers
in the bytecode, which I hope is true, but don't know for sure.

Another option would be doing the vendoring at the source-level, rather
than the bytecode-level, and publishing vendored `.jar`s and
`-sources.jar`s that actually match. These would be indistinguishable from
any other library, to IntelliJ (and in general), so all downstream tooling
should work. We'd have to maintain a fork of each vendored dep where we
check out the release we want, apply a script that renames all package- and
import-statements, and publish the results. It would be a bit more
involved, but possibly less brittle. If tricking IntelliJ with a modified
`-sources.jar` as I described previously would work, we could do that, and
not bother with this.


>
>
>>  - With vendoring, if the vendored dependencies are part of the project
>>> then IntelliJ gets confused because it operates on source, not the produced
>>> jars
>>>
>>
> Yes, I tried several ways to get intellij to ignore the source and use the
> output jars but no luck.
>
>
>> The second one has a quick fix for most cases*: don't make the vendored
>>> dep a subproject, but just separately build and publish it. Since a
>>> vendored dependency should change much more infrequently (or if we bake the
>>> version into the name, ~never) this means we publish once and save headache
>>> and build time forever.
>>>
>>> WDYT? Have I overlooked something? How about we set up vendored versions
>>> of guava, protobuf, grpc, and publish them. We don't have to actually start
>>> using them yet, and can do it incrementally.
>>>
>>
> Currently we are relocating code depending on the version string. If the
> major version is >= 1, we use only the major version within the package
> string and rely on semantic versioning provided by the dependency to not
> break people. If the major version is 0, we assume the dependency is
> unstable and use the full version as part of the package string during
> relocation.
>
> The downside of using the full version string for relocated packages:
> 1) Users will end up with multiple copies of dependencies that differ only
> by the minor or patch version increasing the size of their application.
> 2) Bumping up the version of a dependency now requires the import
> statement in all java files to be updated (not too difficult with some
> sed/grep skills)
>
> The upside of using the full version string in the relocated package:
> 1) We don't have to worry about whether a dependency maintains semantic
> versioning which means our users won't have to worry about that either.
> 2) This increases the odds that a user will load multiple slightly
> different versions of the same dependency which is known to be incompatible
> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
> tcnative works).
>
>
>>
>>> (side note: what do other projects like Flink do?)
>>>
>>> Kenn
>>>
>>> *for generated proto classes, they need to be altered after being
>>> generated so shading happens there, but actually only relocation and the
>>> shared vendored dep should work elsewhere in the project
>>>
>>
> We could publish the protos and treat them as "external" dependencies
> within the Java projects which would also remove this pain point.
>

Re: [DISCUSS] Publish vendored dependencies independently

Reply via email to