OK. I just opened https://github.com/apache/beam/pull/6809 to push Guava
through. I made some comments there, and also I agree with Luke that full
version string makes sense. For this purpose it seems easy and fine to do a
search/replace to swap 20.0 for 20.1, and compatibility between them should
not be a concern.

I have minor suggestions and clarifications:

 - Is there value to `beam` in the artifactId? I would leave it off unless
there's a special need
 - Users should never use these and we make it extremely clear they are not
supported for any reasons
 - Use 0.x versions indicating no intention of semantic versioning

Bringing my comments and Luke's together, here's the proposal:

    groupId: org.apache.beam
    artifactId: vendored-guava-20_0
    namespace: org.apache.beam.vendored.guava.v20_0
    version: 0.1

Alternatively it could be

    groupId: org.apache.beam-vendored
    artifactid: guava-20_0
    namespace: org.apache.beam.vendored.guava.v20_0
    version: 0.1

I like the latter but I haven't gone through the process of establishing a
new groupId.

And for now we do not publish source jars. A couple of TODOs to get the
build in good shape (classifiers, jars, interaction with plugins)

Kenn


On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <[email protected]> wrote:

> It looks like we are agreeing to make each vendored dependency self
> contained and have all their own internal dependencies packaged. For
> example, gRPC and all its transitive dependencies would use
> org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
> dependencies would use org.apache.beam.vendored.calcite.vZZZ.
>
> I also wanted to circle back on this question I had earlier that didn't
> have any follow-up:
> Currently we are relocating code depending on the version string. If the
> major version is >= 1, we use only the major version within the package
> string and rely on semantic versioning provided by the dependency to not
> break people. If the major version is 0, we assume the dependency is
> unstable and use the full version as part of the package string during
> relocation.
>
> The downside of using the full version string for relocated packages:
> 1) Users will end up with multiple copies of dependencies that differ only
> by the minor or patch version increasing the size of their application.
> 2) Bumping up the version of a dependency now requires the import
> statement in all java files to be updated (not too difficult with some
> sed/grep skills)
>
> The upside of using the full version string in the relocated package:
> 1) We don't have to worry about whether a dependency maintains semantic
> versioning which means our users won't have to worry about that either.
> 2) This increases the odds that a user will load multiple slightly
> different versions of the same dependency which is known to be incompatible
> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
> tcnative works).
>
> My preference would be to use the full version string for import
> statements (so org.apache.beam.vendor.grpc.v1_13_1...) since this would
> allow multiple copies to not conflict with each other since in my opinion
> it is a lot more difficult to help a user debug a dependency issue then to
> use string replacement during dependency upgrades to fix import statements.
> Also I would suggest we name the artifacts in Maven as follows:
> groupId: org.apache.beam
> artifactId: beam-vendor-grpc-v1_13_1
> version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
> only for patch upgrades that fix any shading issues we may have had when
> producing the vendored jar)
>
>
> On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <[email protected]> wrote:
>
>> Would also keep it simple and optimize for the JAR size only if necessary.
>>
>> On 24.10.18 00:06, Kenneth Knowles wrote:
>> > I think it makes sense for each vendored dependency to be
>> self-contained
>> > as much as possible. It should keep it fairly simple. Things that cross
>> > their API surface cannot be hidden, of course. Jar size is not a
>> concern
>> > IMO.
>> >
>> > Kenn
>> >
>> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     How should we handle the transitive dependencies of the things we
>> >     want to vendor?
>> >
>> >     For example we use gRPC which depends on Guava 20 and we also use
>> >     Calcite which depends on Guava 19.
>> >
>> >     Should the vendored gRPC/Calcite/... be self-contained so it
>> >     contains all its dependencies, hence vendored gRPC would contain
>> >     Guava 20 and vendored Calcite would contain Guava 19 (both under
>> >     different namespaces)?
>> >     This leads to larger jars but less vendored dependencies to
>> maintain.
>> >
>> >     Or should we produce a vendored library for those that we want to
>> >     share, e.g. Guava 20 that could be reused across multiple vendored
>> >     libraries?
>> >     Makes the vendoring process slightly more complicated, more
>> >     dependencies to maintain, smaller jars.
>> >
>> >     Or should we produce a vendored library for each dependency?
>> >     Lots of vendoring needed, likely tooling required to be built to
>> >     maintain this.
>> >
>> >
>> >
>> >
>> >     On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <[email protected]
>> >     <mailto:[email protected]>> wrote:
>> >
>> >         I actually created the subtasks by finding things shaded by at
>> >         least one module. I think each one should definitely have an on
>> >         list discussion that clarifies the target artifact, namespace,
>> >         version, possible complications, etc.
>> >
>> >         My impression is that many many modules shade only Guava. So for
>> >         build time and simplification that is a big win.
>> >
>> >         Kenn
>> >
>> >         On Tue, Oct 23, 2018, 08:16 Thomas Weise <[email protected]
>> >         <mailto:[email protected]>> wrote:
>> >
>> >             +1 for separate artifacts
>> >
>> >             I would request that we explicitly discuss and agree which
>> >             dependencies we vendor though.
>> >
>> >             Not everything listed in the JIRA subtasks is currently
>> >             relocated.
>> >
>> >             Thomas
>> >
>> >
>> >             On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>> >             <[email protected] <mailto:[email protected]>>
>> >             wrote:
>> >
>> >                 +1 This should improve build times a lot. It would be
>> >                 great if vendored deps could stay in the main
>> repository.
>> >
>> >                 D.
>> >
>> >                 On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
>> >                 <[email protected] <mailto:[email protected]>> wrote:
>> >
>> >                     Looks great, Kenn!
>> >
>> >                      > Max: what is the story behind having a separate
>> >                     flink-shaded repo? Did that make it easier to manage
>> >                     in some way?
>> >
>> >                     Better separation of concerns, but I don't think
>> >                     releasing the shaded
>> >                     artifacts from the main repo is a problem. I'd even
>> >                     prefer not to split
>> >                     up the repo because it makes updates to the vendored
>> >                     dependencies
>> >                     slightly easier.
>> >
>> >                     On 23.10.18 03:25, Kenneth Knowles wrote:
>> >                      > OK, I've filed
>> >                     https://issues.apache.org/jira/browse/BEAM-5819 to
>> >                      > collect sub-tasks. This has enough upsides
>> >                     throughout lots of areas of
>> >                      > the project that even though it is not glamorous
>> >                     it seems pretty
>> >                      > valuable to start on immediately. And I want to
>> >                     find out if there's a
>> >                      > pitfall lurking.
>> >                      >
>> >                      > Max: what is the story behind having a separate
>> >                     flink-shaded repo? Did
>> >                      > that make it easier to manage in some way?
>> >                      >
>> >                      > Kenn
>> >                      >
>> >                      > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
>> >                     Michels <[email protected] <mailto:[email protected]>
>> >                      > <mailto:[email protected] <mailto:[email protected]>>>
>> >                     wrote:
>> >                      >
>> >                      >     +1 for publishing vendored Jars
>> >                     independently. It will improve build
>> >                      >     time and ease IntelliJ integration.
>> >                      >
>> >                      >     Flink also publishes shaded dependencies
>> >                     separately:
>> >                      >
>> >                      >     - https://github.com/apache/flink-shaded
>> >                      >     -
>> >                     https://issues.apache.org/jira/browse/FLINK-6529
>> >                      >
>> >                      >     AFAIK their main motivation was to get rid of
>> >                     duplicate shaded classes
>> >                      >     on the classpath. We don't appear to have
>> >                     that problem because we
>> >                      >     already have a separate "vendor" project.
>> >                      >
>> >                      >      >  - With shading, it is hard (impossible?)
>> >                     to step into dependency
>> >                      >     code in IntelliJ's debugger, because the
>> >                     actual symbol at runtime
>> >                      >     does not match what is in the external jars
>> >                      >
>> >                      >     This would be solved by releasing the sources
>> >                     of the shaded jars.
>> >                      >      From a
>> >                      >     legal perspective, this could be problematic
>> >                     as alluded to here:
>> >                      > https://github.com/apache/flink-shaded/issues/25
>> >                      >
>> >                      >     -Max
>> >                      >
>> >                      >     On 20.10.18 01:11, Lukasz Cwik wrote:
>> >                      >      > I have tried several times to improve the
>> >                     build system and intellij
>> >                      >      > integration and each attempt ended with
>> >                     little progress when dealing
>> >                      >      > with vendored code. My latest attempt has
>> >                     been the most promising
>> >                      >     where
>> >                      >      > I take the vendored classes/jars and
>> >                     decompile them generating the
>> >                      >      > source that Intellij can then use. I have
>> >                     a branch[1] that
>> >                      >     demonstrates
>> >                      >      > the idea. It works pretty well (and up
>> >                     until a change where we
>> >                      >     started
>> >                      >      > vendoring gRPC, was impractical to do.
>> >                     Instructions to try it out
>> >                      >     are:
>> >                      >      >
>> >                      >      > // Clean up any remnants of prior
>> >                     builds/intellij projects
>> >                      >      > git clean -fdx
>> >                      >      > // Generated the source for
>> >                     vendored/shaded modules
>> >                      >      > ./gradlew decompile
>> >                      >      >
>> >                      >      > // Remove the "generated" Java sources for
>> >                     protos so they don't
>> >                      >     conflict with the decompiled sources.
>> >                      >      > rm -rf
>> >                     model/pipeline/build/generated/source/proto
>> >                      >      > rm -rf
>> >                     model/job-management/build/generated/source/proto
>> >                      >      > rm -rf
>> >                     model/fn-execution/build/generated/source/proto
>> >                      >      > // Import the project into Intellij, most
>> >                     code completion now
>> >                      >     works still some issues with a few classes.
>> >                      >      > // Note that the Java decompiler doesn't
>> >                     generate valid source so
>> >                      >     still need to delegate to Gradle for
>> >                     build/run/test actions
>> >                      >      > // Other decompilers may do a better/worse
>> >                     job but haven't tried
>> >                      >     them.
>> >                      >      >
>> >                      >      >
>> >                      >      > The problems that I face are that the
>> >                     generated Java source from the
>> >                      >      > protos and the decompiled source from the
>> >                     compiled version of that
>> >                      >      > source post shading are both being
>> >                     imported as content roots and
>> >                      >     then
>> >                      >      > conflict. Also, the CFR decompiler isn't
>> >                     producing valid source, if
>> >                      >      > people could try others and report their
>> >                     mileage, we may find one
>> >                      >     that
>> >                      >      > works and then we would be able to use
>> >                     intellij to build/run our
>> >                      >     code
>> >                      >      > and not need to delegate all our
>> >                     build/run/test actions to Gradle.
>> >                      >      >
>> >                      >      > After all these attempts I have done,
>> >                     vendoring the dependencies
>> >                      >     outside
>> >                      >      > of the project seems like a sane approach
>> >                     and unless someone
>> >                      >     wants to
>> >                      >      > take a stab at the best progress I have
>> >                     made above, I would go
>> >                      >     with what
>> >                      >      > Kenn is suggesting even though it will
>> >                     mean that we will need to
>> >                      >     perform
>> >                      >      > releases every time we want to change the
>> >                     version of one of our
>> >                      >     vendored
>> >                      >      > dependencies.
>> >                      >      >
>> >                      >      > 1:
>> >
>> https://github.com/lukecwik/incubator-beam/tree/intellij
>> >                      >      >
>> >                      >      >
>> >                      >      > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
>> >                     Knowles <[email protected] <mailto:[email protected]>
>> >                      >     <mailto:[email protected] <mailto:
>> [email protected]>>
>> >                      >      > <mailto:[email protected]
>> >                     <mailto:[email protected]> <mailto:[email protected]
>> >                     <mailto:[email protected]>>>> wrote:
>> >                      >      >
>> >                      >      >     Another reason to push on this is to
>> >                     get build times down.
>> >                      >     Once only
>> >                      >      >     generated proto classes use the shadow
>> >                     plugin we'll cut the build
>> >                      >      >     time in ~half? And there is no reason
>> >                     to constantly re-vendor.
>> >                      >      >
>> >                      >      >     Kenn
>> >                      >      >
>> >                      >      >     On Fri, Oct 19, 2018 at 10:39 AM
>> >                     Kenneth Knowles
>> >                      >     <[email protected] <mailto:[email protected]>
>> >                     <mailto:[email protected] <mailto:[email protected]>>
>> >                      >      >     <mailto:[email protected]
>> >                     <mailto:[email protected]> <mailto:[email protected]
>> >                     <mailto:[email protected]>>>> wrote:
>> >                      >      >
>> >                      >      >         Hi all,
>> >                      >      >
>> >                      >      >         A while ago we had pretty good
>> >                     consensus that we should
>> >                      >     vendor
>> >                      >      >         ("pre-shade") specific
>> >                     dependencies, and there's start on it
>> >                      >      >         here:
>> >                     https://github.com/apache/beam/tree/master/vendor
>> >                      >      >
>> >                      >      >         IntelliJ notes:
>> >                      >      >
>> >                      >      >           - With shading, it is hard
>> >                     (impossible?) to step into
>> >                      >      >         dependency code in IntelliJ's
>> >                     debugger, because the actual
>> >                      >      >         symbol at runtime does not match
>> >                     what is in the external jars
>> >                      >      >
>> >                      >      >
>> >                      >      > Intellij can step through the classes if
>> >                     they were published
>> >                      >     outside the
>> >                      >      > project since it can decompile them. The
>> >                     source won't be legible.
>> >                      >      > Decompiling the source as in my example
>> >                     effectively shows the
>> >                      >     same issue.
>> >                      >      >
>> >                      >      >           - With vendoring, if the
>> >                     vendored dependencies are part
>> >                      >     of the
>> >                      >      >         project then IntelliJ gets
>> >                     confused because it operates on
>> >                      >      >         source, not the produced jars
>> >                      >      >
>> >                      >      >
>> >                      >      > Yes, I tried several ways to get intellij
>> >                     to ignore the source
>> >                      >     and use
>> >                      >      > the output jars but no luck.
>> >                      >      >
>> >                      >      >         The second one has a quick fix for
>> >                     most cases*: don't
>> >                      >     make the
>> >                      >      >         vendored dep a subproject, but
>> >                     just separately build and
>> >                      >     publish
>> >                      >      >         it. Since a vendored dependency
>> >                     should change much more
>> >                      >      >         infrequently (or if we bake the
>> >                     version into the name,
>> >                      >     ~never)
>> >                      >      >         this means we publish once and
>> >                     save headache and build
>> >                      >     time forever.
>> >                      >      >
>> >                      >      >         WDYT? Have I overlooked something?
>> >                     How about we set up
>> >                      >     vendored
>> >                      >      >         versions of guava, protobuf, grpc,
>> >                     and publish them. We don't
>> >                      >      >         have to actually start using them
>> >                     yet, and can do it
>> >                      >     incrementally.
>> >                      >      >
>> >                      >      >
>> >                      >      > Currently we are relocating code depending
>> >                     on the version string.
>> >                      >     If the
>> >                      >      > major version is >= 1, we use only the
>> >                     major version within the
>> >                      >     package
>> >                      >      > string and rely on semantic versioning
>> >                     provided by the dependency
>> >                      >     to not
>> >                      >      > break people. If the major version is 0,
>> >                     we assume the dependency is
>> >                      >      > unstable and use the full version as part
>> >                     of the package string
>> >                      >     during
>> >                      >      > relocation.
>> >                      >      >
>> >                      >      > The downside of using the full version
>> >                     string for relocated packages:
>> >                      >      > 1) Users will end up with multiple copies
>> >                     of dependencies that
>> >                      >     differ
>> >                      >      > only by the minor or patch version
>> >                     increasing the size of their
>> >                      >     application.
>> >                      >      > 2) Bumping up the version of a dependency
>> >                     now requires the import
>> >                      >      > statement in all java files to be updated
>> >                     (not too difficult with
>> >                      >     some
>> >                      >      > sed/grep skills)
>> >                      >      >
>> >                      >      > The upside of using the full version
>> >                     string in the relocated package:
>> >                      >      > 1) We don't have to worry about whether a
>> >                     dependency maintains
>> >                      >     semantic
>> >                      >      > versioning which means our users won't
>> >                     have to worry about that
>> >                      >     either.
>> >                      >      > 2) This increases the odds that a user
>> >                     will load multiple slightly
>> >                      >      > different versions of the same dependency
>> >                     which is known to be
>> >                      >      > incompatible in certain situations (e.g.
>> >                     Netty 4.1.25 can't be on
>> >                      >     the
>> >                      >      > classpath with Netty 4.1.28 even though
>> >                     they are both shaded due to
>> >                      >      > issues of how JNI with tcnative works).
>> >                      >      >
>> >                      >      >
>> >                      >      >         (side note: what do other projects
>> >                     like Flink do?)
>> >                      >      >
>> >                      >      >         Kenn
>> >                      >      >
>> >                      >      >         *for generated proto classes, they
>> >                     need to be altered after
>> >                      >      >         being generated so shading happens
>> >                     there, but actually only
>> >                      >      >         relocation and the shared vendored
>> >                     dep should work
>> >                      >     elsewhere in
>> >                      >      >         the project
>> >                      >      >
>> >                      >      >
>> >                      >      > We could publish the protos and treat them
>> >                     as "external"
>> >                      >     dependencies
>> >                      >      > within the Java projects which would also
>> >                     remove this pain point.
>> >                      >
>> >
>>
>

Reply via email to