It looks like we are agreeing to make each vendored dependency self
contained and have all their own internal dependencies packaged. For
example, gRPC and all its transitive dependencies would use
org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
dependencies would use org.apache.beam.vendored.calcite.vZZZ.

I also wanted to circle back on this question I had earlier that didn't
have any follow-up:
Currently we are relocating code depending on the version string. If the
major version is >= 1, we use only the major version within the package
string and rely on semantic versioning provided by the dependency to not
break people. If the major version is 0, we assume the dependency is
unstable and use the full version as part of the package string during
relocation.

The downside of using the full version string for relocated packages:
1) Users will end up with multiple copies of dependencies that differ only
by the minor or patch version increasing the size of their application.
2) Bumping up the version of a dependency now requires the import statement
in all java files to be updated (not too difficult with some sed/grep
skills)

The upside of using the full version string in the relocated package:
1) We don't have to worry about whether a dependency maintains semantic
versioning which means our users won't have to worry about that either.
2) This increases the odds that a user will load multiple slightly
different versions of the same dependency which is known to be incompatible
in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
Netty 4.1.28 even though they are both shaded due to issues of how JNI with
tcnative works).

My preference would be to use the full version string for import statements
(so org.apache.beam.vendor.grpc.v1_13_1...) since this would allow multiple
copies to not conflict with each other since in my opinion it is a lot more
difficult to help a user debug a dependency issue then to use string
replacement during dependency upgrades to fix import statements. Also I
would suggest we name the artifacts in Maven as follows:
groupId: org.apache.beam
artifactId: beam-vendor-grpc-v1_13_1
version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
only for patch upgrades that fix any shading issues we may have had when
producing the vendored jar)


On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <[email protected]> wrote:

> Would also keep it simple and optimize for the JAR size only if necessary.
>
> On 24.10.18 00:06, Kenneth Knowles wrote:
> > I think it makes sense for each vendored dependency to be self-contained
> > as much as possible. It should keep it fairly simple. Things that cross
> > their API surface cannot be hidden, of course. Jar size is not a concern
> > IMO.
> >
> > Kenn
> >
> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     How should we handle the transitive dependencies of the things we
> >     want to vendor?
> >
> >     For example we use gRPC which depends on Guava 20 and we also use
> >     Calcite which depends on Guava 19.
> >
> >     Should the vendored gRPC/Calcite/... be self-contained so it
> >     contains all its dependencies, hence vendored gRPC would contain
> >     Guava 20 and vendored Calcite would contain Guava 19 (both under
> >     different namespaces)?
> >     This leads to larger jars but less vendored dependencies to maintain.
> >
> >     Or should we produce a vendored library for those that we want to
> >     share, e.g. Guava 20 that could be reused across multiple vendored
> >     libraries?
> >     Makes the vendoring process slightly more complicated, more
> >     dependencies to maintain, smaller jars.
> >
> >     Or should we produce a vendored library for each dependency?
> >     Lots of vendoring needed, likely tooling required to be built to
> >     maintain this.
> >
> >
> >
> >
> >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         I actually created the subtasks by finding things shaded by at
> >         least one module. I think each one should definitely have an on
> >         list discussion that clarifies the target artifact, namespace,
> >         version, possible complications, etc.
> >
> >         My impression is that many many modules shade only Guava. So for
> >         build time and simplification that is a big win.
> >
> >         Kenn
> >
> >         On Tue, Oct 23, 2018, 08:16 Thomas Weise <[email protected]
> >         <mailto:[email protected]>> wrote:
> >
> >             +1 for separate artifacts
> >
> >             I would request that we explicitly discuss and agree which
> >             dependencies we vendor though.
> >
> >             Not everything listed in the JIRA subtasks is currently
> >             relocated.
> >
> >             Thomas
> >
> >
> >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
> >             <[email protected] <mailto:[email protected]>>
> >             wrote:
> >
> >                 +1 This should improve build times a lot. It would be
> >                 great if vendored deps could stay in the main repository.
> >
> >                 D.
> >
> >                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
> >                 <[email protected] <mailto:[email protected]>> wrote:
> >
> >                     Looks great, Kenn!
> >
> >                      > Max: what is the story behind having a separate
> >                     flink-shaded repo? Did that make it easier to manage
> >                     in some way?
> >
> >                     Better separation of concerns, but I don't think
> >                     releasing the shaded
> >                     artifacts from the main repo is a problem. I'd even
> >                     prefer not to split
> >                     up the repo because it makes updates to the vendored
> >                     dependencies
> >                     slightly easier.
> >
> >                     On 23.10.18 03:25, Kenneth Knowles wrote:
> >                      > OK, I've filed
> >                     https://issues.apache.org/jira/browse/BEAM-5819 to
> >                      > collect sub-tasks. This has enough upsides
> >                     throughout lots of areas of
> >                      > the project that even though it is not glamorous
> >                     it seems pretty
> >                      > valuable to start on immediately. And I want to
> >                     find out if there's a
> >                      > pitfall lurking.
> >                      >
> >                      > Max: what is the story behind having a separate
> >                     flink-shaded repo? Did
> >                      > that make it easier to manage in some way?
> >                      >
> >                      > Kenn
> >                      >
> >                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
> >                     Michels <[email protected] <mailto:[email protected]>
> >                      > <mailto:[email protected] <mailto:[email protected]>>>
> >                     wrote:
> >                      >
> >                      >     +1 for publishing vendored Jars
> >                     independently. It will improve build
> >                      >     time and ease IntelliJ integration.
> >                      >
> >                      >     Flink also publishes shaded dependencies
> >                     separately:
> >                      >
> >                      >     - https://github.com/apache/flink-shaded
> >                      >     -
> >                     https://issues.apache.org/jira/browse/FLINK-6529
> >                      >
> >                      >     AFAIK their main motivation was to get rid of
> >                     duplicate shaded classes
> >                      >     on the classpath. We don't appear to have
> >                     that problem because we
> >                      >     already have a separate "vendor" project.
> >                      >
> >                      >      >  - With shading, it is hard (impossible?)
> >                     to step into dependency
> >                      >     code in IntelliJ's debugger, because the
> >                     actual symbol at runtime
> >                      >     does not match what is in the external jars
> >                      >
> >                      >     This would be solved by releasing the sources
> >                     of the shaded jars.
> >                      >      From a
> >                      >     legal perspective, this could be problematic
> >                     as alluded to here:
> >                      > https://github.com/apache/flink-shaded/issues/25
> >                      >
> >                      >     -Max
> >                      >
> >                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
> >                      >      > I have tried several times to improve the
> >                     build system and intellij
> >                      >      > integration and each attempt ended with
> >                     little progress when dealing
> >                      >      > with vendored code. My latest attempt has
> >                     been the most promising
> >                      >     where
> >                      >      > I take the vendored classes/jars and
> >                     decompile them generating the
> >                      >      > source that Intellij can then use. I have
> >                     a branch[1] that
> >                      >     demonstrates
> >                      >      > the idea. It works pretty well (and up
> >                     until a change where we
> >                      >     started
> >                      >      > vendoring gRPC, was impractical to do.
> >                     Instructions to try it out
> >                      >     are:
> >                      >      >
> >                      >      > // Clean up any remnants of prior
> >                     builds/intellij projects
> >                      >      > git clean -fdx
> >                      >      > // Generated the source for
> >                     vendored/shaded modules
> >                      >      > ./gradlew decompile
> >                      >      >
> >                      >      > // Remove the "generated" Java sources for
> >                     protos so they don't
> >                      >     conflict with the decompiled sources.
> >                      >      > rm -rf
> >                     model/pipeline/build/generated/source/proto
> >                      >      > rm -rf
> >                     model/job-management/build/generated/source/proto
> >                      >      > rm -rf
> >                     model/fn-execution/build/generated/source/proto
> >                      >      > // Import the project into Intellij, most
> >                     code completion now
> >                      >     works still some issues with a few classes.
> >                      >      > // Note that the Java decompiler doesn't
> >                     generate valid source so
> >                      >     still need to delegate to Gradle for
> >                     build/run/test actions
> >                      >      > // Other decompilers may do a better/worse
> >                     job but haven't tried
> >                      >     them.
> >                      >      >
> >                      >      >
> >                      >      > The problems that I face are that the
> >                     generated Java source from the
> >                      >      > protos and the decompiled source from the
> >                     compiled version of that
> >                      >      > source post shading are both being
> >                     imported as content roots and
> >                      >     then
> >                      >      > conflict. Also, the CFR decompiler isn't
> >                     producing valid source, if
> >                      >      > people could try others and report their
> >                     mileage, we may find one
> >                      >     that
> >                      >      > works and then we would be able to use
> >                     intellij to build/run our
> >                      >     code
> >                      >      > and not need to delegate all our
> >                     build/run/test actions to Gradle.
> >                      >      >
> >                      >      > After all these attempts I have done,
> >                     vendoring the dependencies
> >                      >     outside
> >                      >      > of the project seems like a sane approach
> >                     and unless someone
> >                      >     wants to
> >                      >      > take a stab at the best progress I have
> >                     made above, I would go
> >                      >     with what
> >                      >      > Kenn is suggesting even though it will
> >                     mean that we will need to
> >                      >     perform
> >                      >      > releases every time we want to change the
> >                     version of one of our
> >                      >     vendored
> >                      >      > dependencies.
> >                      >      >
> >                      >      > 1:
> >
> https://github.com/lukecwik/incubator-beam/tree/intellij
> >                      >      >
> >                      >      >
> >                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
> >                     Knowles <[email protected] <mailto:[email protected]>
> >                      >     <mailto:[email protected] <mailto:
> [email protected]>>
> >                      >      > <mailto:[email protected]
> >                     <mailto:[email protected]> <mailto:[email protected]
> >                     <mailto:[email protected]>>>> wrote:
> >                      >      >
> >                      >      >     Another reason to push on this is to
> >                     get build times down.
> >                      >     Once only
> >                      >      >     generated proto classes use the shadow
> >                     plugin we'll cut the build
> >                      >      >     time in ~half? And there is no reason
> >                     to constantly re-vendor.
> >                      >      >
> >                      >      >     Kenn
> >                      >      >
> >                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
> >                     Kenneth Knowles
> >                      >     <[email protected] <mailto:[email protected]>
> >                     <mailto:[email protected] <mailto:[email protected]>>
> >                      >      >     <mailto:[email protected]
> >                     <mailto:[email protected]> <mailto:[email protected]
> >                     <mailto:[email protected]>>>> wrote:
> >                      >      >
> >                      >      >         Hi all,
> >                      >      >
> >                      >      >         A while ago we had pretty good
> >                     consensus that we should
> >                      >     vendor
> >                      >      >         ("pre-shade") specific
> >                     dependencies, and there's start on it
> >                      >      >         here:
> >                     https://github.com/apache/beam/tree/master/vendor
> >                      >      >
> >                      >      >         IntelliJ notes:
> >                      >      >
> >                      >      >           - With shading, it is hard
> >                     (impossible?) to step into
> >                      >      >         dependency code in IntelliJ's
> >                     debugger, because the actual
> >                      >      >         symbol at runtime does not match
> >                     what is in the external jars
> >                      >      >
> >                      >      >
> >                      >      > Intellij can step through the classes if
> >                     they were published
> >                      >     outside the
> >                      >      > project since it can decompile them. The
> >                     source won't be legible.
> >                      >      > Decompiling the source as in my example
> >                     effectively shows the
> >                      >     same issue.
> >                      >      >
> >                      >      >           - With vendoring, if the
> >                     vendored dependencies are part
> >                      >     of the
> >                      >      >         project then IntelliJ gets
> >                     confused because it operates on
> >                      >      >         source, not the produced jars
> >                      >      >
> >                      >      >
> >                      >      > Yes, I tried several ways to get intellij
> >                     to ignore the source
> >                      >     and use
> >                      >      > the output jars but no luck.
> >                      >      >
> >                      >      >         The second one has a quick fix for
> >                     most cases*: don't
> >                      >     make the
> >                      >      >         vendored dep a subproject, but
> >                     just separately build and
> >                      >     publish
> >                      >      >         it. Since a vendored dependency
> >                     should change much more
> >                      >      >         infrequently (or if we bake the
> >                     version into the name,
> >                      >     ~never)
> >                      >      >         this means we publish once and
> >                     save headache and build
> >                      >     time forever.
> >                      >      >
> >                      >      >         WDYT? Have I overlooked something?
> >                     How about we set up
> >                      >     vendored
> >                      >      >         versions of guava, protobuf, grpc,
> >                     and publish them. We don't
> >                      >      >         have to actually start using them
> >                     yet, and can do it
> >                      >     incrementally.
> >                      >      >
> >                      >      >
> >                      >      > Currently we are relocating code depending
> >                     on the version string.
> >                      >     If the
> >                      >      > major version is >= 1, we use only the
> >                     major version within the
> >                      >     package
> >                      >      > string and rely on semantic versioning
> >                     provided by the dependency
> >                      >     to not
> >                      >      > break people. If the major version is 0,
> >                     we assume the dependency is
> >                      >      > unstable and use the full version as part
> >                     of the package string
> >                      >     during
> >                      >      > relocation.
> >                      >      >
> >                      >      > The downside of using the full version
> >                     string for relocated packages:
> >                      >      > 1) Users will end up with multiple copies
> >                     of dependencies that
> >                      >     differ
> >                      >      > only by the minor or patch version
> >                     increasing the size of their
> >                      >     application.
> >                      >      > 2) Bumping up the version of a dependency
> >                     now requires the import
> >                      >      > statement in all java files to be updated
> >                     (not too difficult with
> >                      >     some
> >                      >      > sed/grep skills)
> >                      >      >
> >                      >      > The upside of using the full version
> >                     string in the relocated package:
> >                      >      > 1) We don't have to worry about whether a
> >                     dependency maintains
> >                      >     semantic
> >                      >      > versioning which means our users won't
> >                     have to worry about that
> >                      >     either.
> >                      >      > 2) This increases the odds that a user
> >                     will load multiple slightly
> >                      >      > different versions of the same dependency
> >                     which is known to be
> >                      >      > incompatible in certain situations (e.g.
> >                     Netty 4.1.25 can't be on
> >                      >     the
> >                      >      > classpath with Netty 4.1.28 even though
> >                     they are both shaded due to
> >                      >      > issues of how JNI with tcnative works).
> >                      >      >
> >                      >      >
> >                      >      >         (side note: what do other projects
> >                     like Flink do?)
> >                      >      >
> >                      >      >         Kenn
> >                      >      >
> >                      >      >         *for generated proto classes, they
> >                     need to be altered after
> >                      >      >         being generated so shading happens
> >                     there, but actually only
> >                      >      >         relocation and the shared vendored
> >                     dep should work
> >                      >     elsewhere in
> >                      >      >         the project
> >                      >      >
> >                      >      >
> >                      >      > We could publish the protos and treat them
> >                     as "external"
> >                      >     dependencies
> >                      >      > within the Java projects which would also
> >                     remove this pain point.
> >                      >
> >
>

Reply via email to