I think it makes sense for each vendored dependency to be self-contained as
much as possible. It should keep it fairly simple. Things that cross their
API surface cannot be hidden, of course. Jar size is not a concern IMO.

Kenn

On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <[email protected]> wrote:

> How should we handle the transitive dependencies of the things we want to
> vendor?
>
> For example we use gRPC which depends on Guava 20 and we also use Calcite
> which depends on Guava 19.
>
> Should the vendored gRPC/Calcite/... be self-contained so it contains all
> its dependencies, hence vendored gRPC would contain Guava 20 and vendored
> Calcite would contain Guava 19 (both under different namespaces)?
> This leads to larger jars but less vendored dependencies to maintain.
>
> Or should we produce a vendored library for those that we want to share,
> e.g. Guava 20 that could be reused across multiple vendored libraries?
> Makes the vendoring process slightly more complicated, more dependencies
> to maintain, smaller jars.
>
> Or should we produce a vendored library for each dependency?
> Lots of vendoring needed, likely tooling required to be built to maintain
> this.
>
>
>
>
> On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <[email protected]> wrote:
>
>> I actually created the subtasks by finding things shaded by at least one
>> module. I think each one should definitely have an on list discussion that
>> clarifies the target artifact, namespace, version, possible complications,
>> etc.
>>
>> My impression is that many many modules shade only Guava. So for build
>> time and simplification that is a big win.
>>
>> Kenn
>>
>> On Tue, Oct 23, 2018, 08:16 Thomas Weise <[email protected]> wrote:
>>
>>> +1 for separate artifacts
>>>
>>> I would request that we explicitly discuss and agree which dependencies
>>> we vendor though.
>>>
>>> Not everything listed in the JIRA subtasks is currently relocated.
>>>
>>> Thomas
>>>
>>>
>>> On Tue, Oct 23, 2018 at 8:04 AM David Morávek <[email protected]>
>>> wrote:
>>>
>>>> +1 This should improve build times a lot. It would be great if vendored
>>>> deps could stay in the main repository.
>>>>
>>>> D.
>>>>
>>>> On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels <[email protected]>
>>>> wrote:
>>>>
>>>>> Looks great, Kenn!
>>>>>
>>>>> > Max: what is the story behind having a separate flink-shaded repo?
>>>>> Did that make it easier to manage in some way?
>>>>>
>>>>> Better separation of concerns, but I don't think releasing the shaded
>>>>> artifacts from the main repo is a problem. I'd even prefer not to
>>>>> split
>>>>> up the repo because it makes updates to the vendored dependencies
>>>>> slightly easier.
>>>>>
>>>>> On 23.10.18 03:25, Kenneth Knowles wrote:
>>>>> > OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to
>>>>> > collect sub-tasks. This has enough upsides throughout lots of areas
>>>>> of
>>>>> > the project that even though it is not glamorous it seems pretty
>>>>> > valuable to start on immediately. And I want to find out if there's
>>>>> a
>>>>> > pitfall lurking.
>>>>> >
>>>>> > Max: what is the story behind having a separate flink-shaded repo?
>>>>> Did
>>>>> > that make it easier to manage in some way?
>>>>> >
>>>>> > Kenn
>>>>> >
>>>>> > On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <[email protected]
>>>>> > <mailto:[email protected]>> wrote:
>>>>> >
>>>>> >     +1 for publishing vendored Jars independently. It will improve
>>>>> build
>>>>> >     time and ease IntelliJ integration.
>>>>> >
>>>>> >     Flink also publishes shaded dependencies separately:
>>>>> >
>>>>> >     - https://github.com/apache/flink-shaded
>>>>> >     - https://issues.apache.org/jira/browse/FLINK-6529
>>>>> >
>>>>> >     AFAIK their main motivation was to get rid of duplicate shaded
>>>>> classes
>>>>> >     on the classpath. We don't appear to have that problem because we
>>>>> >     already have a separate "vendor" project.
>>>>> >
>>>>> >      >  - With shading, it is hard (impossible?) to step into
>>>>> dependency
>>>>> >     code in IntelliJ's debugger, because the actual symbol at runtime
>>>>> >     does not match what is in the external jars
>>>>> >
>>>>> >     This would be solved by releasing the sources of the shaded jars.
>>>>> >      From a
>>>>> >     legal perspective, this could be problematic as alluded to here:
>>>>> >     https://github.com/apache/flink-shaded/issues/25
>>>>> >
>>>>> >     -Max
>>>>> >
>>>>> >     On 20.10.18 01:11, Lukasz Cwik wrote:
>>>>> >      > I have tried several times to improve the build system and
>>>>> intellij
>>>>> >      > integration and each attempt ended with little progress when
>>>>> dealing
>>>>> >      > with vendored code. My latest attempt has been the most
>>>>> promising
>>>>> >     where
>>>>> >      > I take the vendored classes/jars and decompile them
>>>>> generating the
>>>>> >      > source that Intellij can then use. I have a branch[1] that
>>>>> >     demonstrates
>>>>> >      > the idea. It works pretty well (and up until a change where we
>>>>> >     started
>>>>> >      > vendoring gRPC, was impractical to do. Instructions to try it
>>>>> out
>>>>> >     are:
>>>>> >      >
>>>>> >      > // Clean up any remnants of prior builds/intellij projects
>>>>> >      > git clean -fdx
>>>>> >      > // Generated the source for vendored/shaded modules
>>>>> >      > ./gradlew decompile
>>>>> >      >
>>>>> >      > // Remove the "generated" Java sources for protos so they
>>>>> don't
>>>>> >     conflict with the decompiled sources.
>>>>> >      > rm -rf model/pipeline/build/generated/source/proto
>>>>> >      > rm -rf model/job-management/build/generated/source/proto
>>>>> >      > rm -rf model/fn-execution/build/generated/source/proto
>>>>> >      > // Import the project into Intellij, most code completion now
>>>>> >     works still some issues with a few classes.
>>>>> >      > // Note that the Java decompiler doesn't generate valid
>>>>> source so
>>>>> >     still need to delegate to Gradle for build/run/test actions
>>>>> >      > // Other decompilers may do a better/worse job but haven't
>>>>> tried
>>>>> >     them.
>>>>> >      >
>>>>> >      >
>>>>> >      > The problems that I face are that the generated Java source
>>>>> from the
>>>>> >      > protos and the decompiled source from the compiled version of
>>>>> that
>>>>> >      > source post shading are both being imported as content roots
>>>>> and
>>>>> >     then
>>>>> >      > conflict. Also, the CFR decompiler isn't producing valid
>>>>> source, if
>>>>> >      > people could try others and report their mileage, we may find
>>>>> one
>>>>> >     that
>>>>> >      > works and then we would be able to use intellij to build/run
>>>>> our
>>>>> >     code
>>>>> >      > and not need to delegate all our build/run/test actions to
>>>>> Gradle.
>>>>> >      >
>>>>> >      > After all these attempts I have done, vendoring the
>>>>> dependencies
>>>>> >     outside
>>>>> >      > of the project seems like a sane approach and unless someone
>>>>> >     wants to
>>>>> >      > take a stab at the best progress I have made above, I would go
>>>>> >     with what
>>>>> >      > Kenn is suggesting even though it will mean that we will need
>>>>> to
>>>>> >     perform
>>>>> >      > releases every time we want to change the version of one of
>>>>> our
>>>>> >     vendored
>>>>> >      > dependencies.
>>>>> >      >
>>>>> >      > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
>>>>> >      >
>>>>> >      >
>>>>> >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <
>>>>> [email protected]
>>>>> >     <mailto:[email protected]>
>>>>> >      > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>> >      >
>>>>> >      >     Another reason to push on this is to get build times down.
>>>>> >     Once only
>>>>> >      >     generated proto classes use the shadow plugin we'll cut
>>>>> the build
>>>>> >      >     time in ~half? And there is no reason to constantly
>>>>> re-vendor.
>>>>> >      >
>>>>> >      >     Kenn
>>>>> >      >
>>>>> >      >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
>>>>> >     <[email protected] <mailto:[email protected]>
>>>>> >      >     <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>> >      >
>>>>> >      >         Hi all,
>>>>> >      >
>>>>> >      >         A while ago we had pretty good consensus that we
>>>>> should
>>>>> >     vendor
>>>>> >      >         ("pre-shade") specific dependencies, and there's
>>>>> start on it
>>>>> >      >         here:
>>>>> https://github.com/apache/beam/tree/master/vendor
>>>>> >      >
>>>>> >      >         IntelliJ notes:
>>>>> >      >
>>>>> >      >           - With shading, it is hard (impossible?) to step
>>>>> into
>>>>> >      >         dependency code in IntelliJ's debugger, because the
>>>>> actual
>>>>> >      >         symbol at runtime does not match what is in the
>>>>> external jars
>>>>> >      >
>>>>> >      >
>>>>> >      > Intellij can step through the classes if they were published
>>>>> >     outside the
>>>>> >      > project since it can decompile them. The source won't be
>>>>> legible.
>>>>> >      > Decompiling the source as in my example effectively shows the
>>>>> >     same issue.
>>>>> >      >
>>>>> >      >           - With vendoring, if the vendored dependencies are
>>>>> part
>>>>> >     of the
>>>>> >      >         project then IntelliJ gets confused because it
>>>>> operates on
>>>>> >      >         source, not the produced jars
>>>>> >      >
>>>>> >      >
>>>>> >      > Yes, I tried several ways to get intellij to ignore the source
>>>>> >     and use
>>>>> >      > the output jars but no luck.
>>>>> >      >
>>>>> >      >         The second one has a quick fix for most cases*: don't
>>>>> >     make the
>>>>> >      >         vendored dep a subproject, but just separately build
>>>>> and
>>>>> >     publish
>>>>> >      >         it. Since a vendored dependency should change much
>>>>> more
>>>>> >      >         infrequently (or if we bake the version into the name,
>>>>> >     ~never)
>>>>> >      >         this means we publish once and save headache and build
>>>>> >     time forever.
>>>>> >      >
>>>>> >      >         WDYT? Have I overlooked something? How about we set up
>>>>> >     vendored
>>>>> >      >         versions of guava, protobuf, grpc, and publish them.
>>>>> We don't
>>>>> >      >         have to actually start using them yet, and can do it
>>>>> >     incrementally.
>>>>> >      >
>>>>> >      >
>>>>> >      > Currently we are relocating code depending on the version
>>>>> string.
>>>>> >     If the
>>>>> >      > major version is >= 1, we use only the major version within
>>>>> the
>>>>> >     package
>>>>> >      > string and rely on semantic versioning provided by the
>>>>> dependency
>>>>> >     to not
>>>>> >      > break people. If the major version is 0, we assume the
>>>>> dependency is
>>>>> >      > unstable and use the full version as part of the package
>>>>> string
>>>>> >     during
>>>>> >      > relocation.
>>>>> >      >
>>>>> >      > The downside of using the full version string for relocated
>>>>> packages:
>>>>> >      > 1) Users will end up with multiple copies of dependencies that
>>>>> >     differ
>>>>> >      > only by the minor or patch version increasing the size of
>>>>> their
>>>>> >     application.
>>>>> >      > 2) Bumping up the version of a dependency now requires the
>>>>> import
>>>>> >      > statement in all java files to be updated (not too difficult
>>>>> with
>>>>> >     some
>>>>> >      > sed/grep skills)
>>>>> >      >
>>>>> >      > The upside of using the full version string in the relocated
>>>>> package:
>>>>> >      > 1) We don't have to worry about whether a dependency maintains
>>>>> >     semantic
>>>>> >      > versioning which means our users won't have to worry about
>>>>> that
>>>>> >     either.
>>>>> >      > 2) This increases the odds that a user will load multiple
>>>>> slightly
>>>>> >      > different versions of the same dependency which is known to be
>>>>> >      > incompatible in certain situations (e.g. Netty 4.1.25 can't
>>>>> be on
>>>>> >     the
>>>>> >      > classpath with Netty 4.1.28 even though they are both shaded
>>>>> due to
>>>>> >      > issues of how JNI with tcnative works).
>>>>> >      >
>>>>> >      >
>>>>> >      >         (side note: what do other projects like Flink do?)
>>>>> >      >
>>>>> >      >         Kenn
>>>>> >      >
>>>>> >      >         *for generated proto classes, they need to be altered
>>>>> after
>>>>> >      >         being generated so shading happens there, but
>>>>> actually only
>>>>> >      >         relocation and the shared vendored dep should work
>>>>> >     elsewhere in
>>>>> >      >         the project
>>>>> >      >
>>>>> >      >
>>>>> >      > We could publish the protos and treat them as "external"
>>>>> >     dependencies
>>>>> >      > within the Java projects which would also remove this pain
>>>>> point.
>>>>> >
>>>>>
>>>>

Reply via email to