OK. I just opened https://github.com/apache/beam/pull/6809 to push Guava
through. I made some comments there, and also I agree with Luke that full
version string makes sense. For this purpose it seems easy and fine to do a
search/replace to swap 20.0 for 20.1, and compatibility between them should
not be a concern.
I have minor suggestions and clarifications:
- Is there value to `beam` in the artifactId? I would leave it off unless
there's a special need
- Users should never use these and we make it extremely clear they are not
supported for any reasons
- Use 0.x versions indicating no intention of semantic versioning
Bringing my comments and Luke's together, here's the proposal:
groupId: org.apache.beam
artifactId: vendored-guava-20_0
namespace: org.apache.beam.vendored.guava.v20_0
version: 0.1
Alternatively it could be
groupId: org.apache.beam-vendored
artifactid: guava-20_0
namespace: org.apache.beam.vendored.guava.v20_0
version: 0.1
I like the latter but I haven't gone through the process of establishing a
new groupId.
And for now we do not publish source jars. A couple of TODOs to get the
build in good shape (classifiers, jars, interaction with plugins)
Kenn
On Wed, Oct 24, 2018 at 10:13 AM Lukasz Cwik <[email protected]> wrote:
> It looks like we are agreeing to make each vendored dependency self
> contained and have all their own internal dependencies packaged. For
> example, gRPC and all its transitive dependencies would use
> org.apache.beam.vendored.grpc.vYYY and Calcite and all its transitive
> dependencies would use org.apache.beam.vendored.calcite.vZZZ.
>
> I also wanted to circle back on this question I had earlier that didn't
> have any follow-up:
> Currently we are relocating code depending on the version string. If the
> major version is >= 1, we use only the major version within the package
> string and rely on semantic versioning provided by the dependency to not
> break people. If the major version is 0, we assume the dependency is
> unstable and use the full version as part of the package string during
> relocation.
>
> The downside of using the full version string for relocated packages:
> 1) Users will end up with multiple copies of dependencies that differ only
> by the minor or patch version increasing the size of their application.
> 2) Bumping up the version of a dependency now requires the import
> statement in all java files to be updated (not too difficult with some
> sed/grep skills)
>
> The upside of using the full version string in the relocated package:
> 1) We don't have to worry about whether a dependency maintains semantic
> versioning which means our users won't have to worry about that either.
> 2) This increases the odds that a user will load multiple slightly
> different versions of the same dependency which is known to be incompatible
> in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
> Netty 4.1.28 even though they are both shaded due to issues of how JNI with
> tcnative works).
>
> My preference would be to use the full version string for import
> statements (so org.apache.beam.vendor.grpc.v1_13_1...) since this would
> allow multiple copies to not conflict with each other since in my opinion
> it is a lot more difficult to help a user debug a dependency issue then to
> use string replacement during dependency upgrades to fix import statements.
> Also I would suggest we name the artifacts in Maven as follows:
> groupId: org.apache.beam
> artifactId: beam-vendor-grpc-v1_13_1
> version: 1.0.0 (first version and subsequent versions such as 1.0.1 are
> only for patch upgrades that fix any shading issues we may have had when
> producing the vendored jar)
>
>
> On Wed, Oct 24, 2018 at 6:01 AM Maximilian Michels <[email protected]> wrote:
>
>> Would also keep it simple and optimize for the JAR size only if necessary.
>>
>> On 24.10.18 00:06, Kenneth Knowles wrote:
>> > I think it makes sense for each vendored dependency to be
>> self-contained
>> > as much as possible. It should keep it fairly simple. Things that cross
>> > their API surface cannot be hidden, of course. Jar size is not a
>> concern
>> > IMO.
>> >
>> > Kenn
>> >
>> > On Tue, Oct 23, 2018 at 9:05 AM Lukasz Cwik <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> > How should we handle the transitive dependencies of the things we
>> > want to vendor?
>> >
>> > For example we use gRPC which depends on Guava 20 and we also use
>> > Calcite which depends on Guava 19.
>> >
>> > Should the vendored gRPC/Calcite/... be self-contained so it
>> > contains all its dependencies, hence vendored gRPC would contain
>> > Guava 20 and vendored Calcite would contain Guava 19 (both under
>> > different namespaces)?
>> > This leads to larger jars but less vendored dependencies to
>> maintain.
>> >
>> > Or should we produce a vendored library for those that we want to
>> > share, e.g. Guava 20 that could be reused across multiple vendored
>> > libraries?
>> > Makes the vendoring process slightly more complicated, more
>> > dependencies to maintain, smaller jars.
>> >
>> > Or should we produce a vendored library for each dependency?
>> > Lots of vendoring needed, likely tooling required to be built to
>> > maintain this.
>> >
>> >
>> >
>> >
>> > On Tue, Oct 23, 2018 at 8:46 AM Kenneth Knowles <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> > I actually created the subtasks by finding things shaded by at
>> > least one module. I think each one should definitely have an on
>> > list discussion that clarifies the target artifact, namespace,
>> > version, possible complications, etc.
>> >
>> > My impression is that many many modules shade only Guava. So for
>> > build time and simplification that is a big win.
>> >
>> > Kenn
>> >
>> > On Tue, Oct 23, 2018, 08:16 Thomas Weise <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> > +1 for separate artifacts
>> >
>> > I would request that we explicitly discuss and agree which
>> > dependencies we vendor though.
>> >
>> > Not everything listed in the JIRA subtasks is currently
>> > relocated.
>> >
>> > Thomas
>> >
>> >
>> > On Tue, Oct 23, 2018 at 8:04 AM David Morávek
>> > <[email protected] <mailto:[email protected]>>
>> > wrote:
>> >
>> > +1 This should improve build times a lot. It would be
>> > great if vendored deps could stay in the main
>> repository.
>> >
>> > D.
>> >
>> > On Tue, Oct 23, 2018 at 12:21 PM Maximilian Michels
>> > <[email protected] <mailto:[email protected]>> wrote:
>> >
>> > Looks great, Kenn!
>> >
>> > > Max: what is the story behind having a separate
>> > flink-shaded repo? Did that make it easier to manage
>> > in some way?
>> >
>> > Better separation of concerns, but I don't think
>> > releasing the shaded
>> > artifacts from the main repo is a problem. I'd even
>> > prefer not to split
>> > up the repo because it makes updates to the vendored
>> > dependencies
>> > slightly easier.
>> >
>> > On 23.10.18 03:25, Kenneth Knowles wrote:
>> > > OK, I've filed
>> > https://issues.apache.org/jira/browse/BEAM-5819 to
>> > > collect sub-tasks. This has enough upsides
>> > throughout lots of areas of
>> > > the project that even though it is not glamorous
>> > it seems pretty
>> > > valuable to start on immediately. And I want to
>> > find out if there's a
>> > > pitfall lurking.
>> > >
>> > > Max: what is the story behind having a separate
>> > flink-shaded repo? Did
>> > > that make it easier to manage in some way?
>> > >
>> > > Kenn
>> > >
>> > > On Mon, Oct 22, 2018 at 2:55 AM Maximilian
>> > Michels <[email protected] <mailto:[email protected]>
>> > > <mailto:[email protected] <mailto:[email protected]>>>
>> > wrote:
>> > >
>> > > +1 for publishing vendored Jars
>> > independently. It will improve build
>> > > time and ease IntelliJ integration.
>> > >
>> > > Flink also publishes shaded dependencies
>> > separately:
>> > >
>> > > - https://github.com/apache/flink-shaded
>> > > -
>> > https://issues.apache.org/jira/browse/FLINK-6529
>> > >
>> > > AFAIK their main motivation was to get rid of
>> > duplicate shaded classes
>> > > on the classpath. We don't appear to have
>> > that problem because we
>> > > already have a separate "vendor" project.
>> > >
>> > > > - With shading, it is hard (impossible?)
>> > to step into dependency
>> > > code in IntelliJ's debugger, because the
>> > actual symbol at runtime
>> > > does not match what is in the external jars
>> > >
>> > > This would be solved by releasing the sources
>> > of the shaded jars.
>> > > From a
>> > > legal perspective, this could be problematic
>> > as alluded to here:
>> > > https://github.com/apache/flink-shaded/issues/25
>> > >
>> > > -Max
>> > >
>> > > On 20.10.18 01:11, Lukasz Cwik wrote:
>> > > > I have tried several times to improve the
>> > build system and intellij
>> > > > integration and each attempt ended with
>> > little progress when dealing
>> > > > with vendored code. My latest attempt has
>> > been the most promising
>> > > where
>> > > > I take the vendored classes/jars and
>> > decompile them generating the
>> > > > source that Intellij can then use. I have
>> > a branch[1] that
>> > > demonstrates
>> > > > the idea. It works pretty well (and up
>> > until a change where we
>> > > started
>> > > > vendoring gRPC, was impractical to do.
>> > Instructions to try it out
>> > > are:
>> > > >
>> > > > // Clean up any remnants of prior
>> > builds/intellij projects
>> > > > git clean -fdx
>> > > > // Generated the source for
>> > vendored/shaded modules
>> > > > ./gradlew decompile
>> > > >
>> > > > // Remove the "generated" Java sources for
>> > protos so they don't
>> > > conflict with the decompiled sources.
>> > > > rm -rf
>> > model/pipeline/build/generated/source/proto
>> > > > rm -rf
>> > model/job-management/build/generated/source/proto
>> > > > rm -rf
>> > model/fn-execution/build/generated/source/proto
>> > > > // Import the project into Intellij, most
>> > code completion now
>> > > works still some issues with a few classes.
>> > > > // Note that the Java decompiler doesn't
>> > generate valid source so
>> > > still need to delegate to Gradle for
>> > build/run/test actions
>> > > > // Other decompilers may do a better/worse
>> > job but haven't tried
>> > > them.
>> > > >
>> > > >
>> > > > The problems that I face are that the
>> > generated Java source from the
>> > > > protos and the decompiled source from the
>> > compiled version of that
>> > > > source post shading are both being
>> > imported as content roots and
>> > > then
>> > > > conflict. Also, the CFR decompiler isn't
>> > producing valid source, if
>> > > > people could try others and report their
>> > mileage, we may find one
>> > > that
>> > > > works and then we would be able to use
>> > intellij to build/run our
>> > > code
>> > > > and not need to delegate all our
>> > build/run/test actions to Gradle.
>> > > >
>> > > > After all these attempts I have done,
>> > vendoring the dependencies
>> > > outside
>> > > > of the project seems like a sane approach
>> > and unless someone
>> > > wants to
>> > > > take a stab at the best progress I have
>> > made above, I would go
>> > > with what
>> > > > Kenn is suggesting even though it will
>> > mean that we will need to
>> > > perform
>> > > > releases every time we want to change the
>> > version of one of our
>> > > vendored
>> > > > dependencies.
>> > > >
>> > > > 1:
>> >
>> https://github.com/lukecwik/incubator-beam/tree/intellij
>> > > >
>> > > >
>> > > > On Fri, Oct 19, 2018 at 10:43 AM Kenneth
>> > Knowles <[email protected] <mailto:[email protected]>
>> > > <mailto:[email protected] <mailto:
>> [email protected]>>
>> > > > <mailto:[email protected]
>> > <mailto:[email protected]> <mailto:[email protected]
>> > <mailto:[email protected]>>>> wrote:
>> > > >
>> > > > Another reason to push on this is to
>> > get build times down.
>> > > Once only
>> > > > generated proto classes use the shadow
>> > plugin we'll cut the build
>> > > > time in ~half? And there is no reason
>> > to constantly re-vendor.
>> > > >
>> > > > Kenn
>> > > >
>> > > > On Fri, Oct 19, 2018 at 10:39 AM
>> > Kenneth Knowles
>> > > <[email protected] <mailto:[email protected]>
>> > <mailto:[email protected] <mailto:[email protected]>>
>> > > > <mailto:[email protected]
>> > <mailto:[email protected]> <mailto:[email protected]
>> > <mailto:[email protected]>>>> wrote:
>> > > >
>> > > > Hi all,
>> > > >
>> > > > A while ago we had pretty good
>> > consensus that we should
>> > > vendor
>> > > > ("pre-shade") specific
>> > dependencies, and there's start on it
>> > > > here:
>> > https://github.com/apache/beam/tree/master/vendor
>> > > >
>> > > > IntelliJ notes:
>> > > >
>> > > > - With shading, it is hard
>> > (impossible?) to step into
>> > > > dependency code in IntelliJ's
>> > debugger, because the actual
>> > > > symbol at runtime does not match
>> > what is in the external jars
>> > > >
>> > > >
>> > > > Intellij can step through the classes if
>> > they were published
>> > > outside the
>> > > > project since it can decompile them. The
>> > source won't be legible.
>> > > > Decompiling the source as in my example
>> > effectively shows the
>> > > same issue.
>> > > >
>> > > > - With vendoring, if the
>> > vendored dependencies are part
>> > > of the
>> > > > project then IntelliJ gets
>> > confused because it operates on
>> > > > source, not the produced jars
>> > > >
>> > > >
>> > > > Yes, I tried several ways to get intellij
>> > to ignore the source
>> > > and use
>> > > > the output jars but no luck.
>> > > >
>> > > > The second one has a quick fix for
>> > most cases*: don't
>> > > make the
>> > > > vendored dep a subproject, but
>> > just separately build and
>> > > publish
>> > > > it. Since a vendored dependency
>> > should change much more
>> > > > infrequently (or if we bake the
>> > version into the name,
>> > > ~never)
>> > > > this means we publish once and
>> > save headache and build
>> > > time forever.
>> > > >
>> > > > WDYT? Have I overlooked something?
>> > How about we set up
>> > > vendored
>> > > > versions of guava, protobuf, grpc,
>> > and publish them. We don't
>> > > > have to actually start using them
>> > yet, and can do it
>> > > incrementally.
>> > > >
>> > > >
>> > > > Currently we are relocating code depending
>> > on the version string.
>> > > If the
>> > > > major version is >= 1, we use only the
>> > major version within the
>> > > package
>> > > > string and rely on semantic versioning
>> > provided by the dependency
>> > > to not
>> > > > break people. If the major version is 0,
>> > we assume the dependency is
>> > > > unstable and use the full version as part
>> > of the package string
>> > > during
>> > > > relocation.
>> > > >
>> > > > The downside of using the full version
>> > string for relocated packages:
>> > > > 1) Users will end up with multiple copies
>> > of dependencies that
>> > > differ
>> > > > only by the minor or patch version
>> > increasing the size of their
>> > > application.
>> > > > 2) Bumping up the version of a dependency
>> > now requires the import
>> > > > statement in all java files to be updated
>> > (not too difficult with
>> > > some
>> > > > sed/grep skills)
>> > > >
>> > > > The upside of using the full version
>> > string in the relocated package:
>> > > > 1) We don't have to worry about whether a
>> > dependency maintains
>> > > semantic
>> > > > versioning which means our users won't
>> > have to worry about that
>> > > either.
>> > > > 2) This increases the odds that a user
>> > will load multiple slightly
>> > > > different versions of the same dependency
>> > which is known to be
>> > > > incompatible in certain situations (e.g.
>> > Netty 4.1.25 can't be on
>> > > the
>> > > > classpath with Netty 4.1.28 even though
>> > they are both shaded due to
>> > > > issues of how JNI with tcnative works).
>> > > >
>> > > >
>> > > > (side note: what do other projects
>> > like Flink do?)
>> > > >
>> > > > Kenn
>> > > >
>> > > > *for generated proto classes, they
>> > need to be altered after
>> > > > being generated so shading happens
>> > there, but actually only
>> > > > relocation and the shared vendored
>> > dep should work
>> > > elsewhere in
>> > > > the project
>> > > >
>> > > >
>> > > > We could publish the protos and treat them
>> > as "external"
>> > > dependencies
>> > > > within the Java projects which would also
>> > remove this pain point.
>> > >
>> >
>>
>